A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System

화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더

  • 민병제 (㈜ SFA Engineering) ;
  • 박동철 (명지대학교 정보공학과 지능컴퓨팅 연구실)
  • Published : 2008.09.30

Abstract

A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).

본 논문은 AMR(Adaptive Multi Rate)코더의 7.4kit/s 모드를 기반으로 화자 의존적인 환경에서 더욱 압축률을 높인 새로운 켈프(CELP)계열의 코더를 제안한다. 제안된 코더는 OGM(OutGoing Message)이나 TTS(Text-To-Speech) 등 한 사람의 음성만을 필요로 하는 시스템에서 유용하게 사용할 수 있다. 새로운 코더의 압축률을 높이기 위해서 무감독 학습 신경망인 Centroid Neural Networks(CNN)를 이용한 새로운 LSP 코드북을 생성하여 사용한다. 또한 고정 코드북 탐색 단계에서 AMR 7.4 kbit/s 모드에서는 4개의 펄스를 서브프레임 마다 사용하는 대신에 새로운 코더에서는 오직 2개의 펄스만을 사용하기 때문에 압축률을 더 높일 수 있다. 이로 인해서 스피치의 질이 감소하게 되는데, 각 서브프레임 마다 예상하는 펄스를 적용함으로써 보상받을 수 있다. 제안된 보코더는 기존 AMR 7.4Kbps모드와 비교해 27% 높은 압축률을 가지는 동시에, MOS( Mean Opinion Score)의 면에서 볼 때, 대등한 음질을 보였다.

Keywords

References

  1. M. Decina and G. Modena, "CCITT standards on digital speech processing," IEEE Journal on Selected Areas in Communication, 6, pp.227-234, 1988 https://doi.org/10.1109/49.599
  2. ISO/IEC 14496-3, information technology - very low bit rate audio-visual coding, part 3: Audio, Subpart 1-3, 1998
  3. ITU-T Recommendation G.723.1, Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s, 1996
  4. C. Laflamme, J.P. Adoul, H.Y. Su, and S. Morissette, "On reducing computational complexity of codebook search in CELP coder through the use of algebraic codes," Proc. IEEE ICASSP, Vol.1, pp.177-180, 1990
  5. ITU-T Recommendation G.729. "Coding of speech at 8kbit/s using conjugate structure algebraic- code-excited linear prediction (CS-ACELP), 1996
  6. ETSI, Digital cellular telecommunications system( phase2): Enhanced full rate(EFR) speech transcoding (GSM 06.60 version 6.0.0), ETSI EN pp.300-726, 1997
  7. H.J. Kim, D.G. Jee, M.H. Park, B.S. Yoon, and S.I. Choi, "The real-time implementations of AMR codec for IMT-2000 system," Advanced Communication Technology, ICACT, The 7th International Conference,(1), pp.362-365, 2005
  8. J. Srinonchat, S. Danaher, and A. Murray, "Address vector quantisation applied to speech coding." Proc. IEEE Int. Symp. on Sig. Proc. and Info. Tech., pp.745-748, 2003
  9. M. Schroeder and B. Ata, "Code-Excited Linear Predictive (CELP): high quality speech at very bit rate." Proc. IEEE ICASSP, pp.937-940, 1985
  10. 김경민, 윤성완, 최용수, 박영철, 최용수, 윤대 희, 강태익, "이중 전송률(2.4/4.0 kbps)을 갖는 개선된 하모닉-CELP 음성부호화기," 한국통신학회 논문지, 28권 제3C호 pp.457-462, 2003
  11. C.H. Lee, S.K. Jung, and H.G. Kang, "Applying a Speaker-Dependent SpeechCompression Technique to Concatenative TTS Synthesizers", IEEE Trans. Speech and Audio Proc.. Vol.15, pp.632-640, 2007
  12. 안병호, 유지상, 이승훈, 김상훈, " TTS를 이용 한 멀티미디어 서비스", 한국통신학회지, 제16 권 제5호, pp.534-543, 1999
  13. 3GPP TS 26.071 V 7.0.0, "Adaptive Multi-Rate speech processing functions; General description", 1999
  14. 3GPP TS 26.090 V7.0.0, "Adaptive Multi-Rate speech transcoding", 1999
  15. 3GPP TS 26.073 V7.0.0, "Adaptive Multi Rate (AMR) speech; ANSI-C code for the AMR speech codec", 1999
  16. D.C. Park, "Centroid Neural Network for Unsupervised Competitive Learning." IEEE Trans. Neural Networks, Vol.11, pp.520-528, 2000 https://doi.org/10.1109/72.839021
  17. D.C. Park and Y.J. Woo, "Weighted centroid neural network for edge reserving image compression." IEEE Trans. Neural Networks, Vol.12, pp.1134-1146, 2001 https://doi.org/10.1109/72.950142
  18. 이송재, 박동철 "Bhattacharyya 커널을 적용한 Centroid Neural Network." 한국통신학회 논문지, 32권 9호, pp.861-866, 2008
  19. Kohonen, T. "The 'neural' phonetic typewriter". IEEE Computer, 21, pp.11-22, 1988
  20. Lei Zhang, Tian Wang and Cuperman. V. A "CELP variable rate speech codec with low average rate." IEEE International Conference on ICASSP, 2, pp.735-738, 1997
  21. D.C. Park, O.-H. Kwon, and J. Chung, "Centroid Neural NetworkWith a Divergence Measure for GPDF Data Clustering," IEEE Trans. Neural Networks, Vol.19, No.6, pp.948-957, 2008 https://doi.org/10.1109/TNN.2007.2000051