DOI QR코드

DOI QR Code

An Efficient Voice Activity Detection Method using Bi-Level HMM

Bi-Level HMM을 이용한 효율적인 음성구간 검출 방법

  • Received : 2015.07.08
  • Accepted : 2015.08.23
  • Published : 2015.08.31

Abstract

We presented a method for Vad(Voice Activity Detection) using Bi-level HMM. Conventional methods need to do an additional post processing or set rule-based delayed frames. To cope with the problem, we applied to VAD a Bi-level HMM that has an inserted state layer into a typical HMM. And we used posterior ratio of voice states to detect voice period. Considering MFCCs(: Mel-Frequency Cepstral Coefficients) as observation vectors, we performed some experiments with voice data of different SNRs and achieved satisfactory results compared with well-known methods.

본 논문에서는 Bi-Level HMM을 이용한 음성구간 검출 방법을 제안하였다. 기존의 음성 구간 검출법은 짧은 상태변화 오류(Burst Clipping)를 제거하기 위하여 별도의 후처리 과정을 거치든가, 규칙 기반 지연 프레임을 설정해야만 한다. 이러한 문제에 대처하기 위하여 기존의 HMM 모델에 상태 계층을 추가한 Bi-Level HMM을 이용하여 음성구간 판정을 위해 음성상태의 사후 확률비를 이용하였다. 사람의 청각특성을 고려한 MFCC를 특징치로 하여, 다양한 SNR의 음성 데이터에 대한 평가지표를 활용한 실험을 수행하여 기존의 음성상태 분류법보다 우수한 결과를 얻을 수 있었다.

Keywords

References

  1. Y. Zhang, Z. Tang, Y. Li, and Y. Luo, "A hierarchical framework approach for voice activity detection and speech enhancement," The Scientific World J., vol. 2014, 2014, pp. 1-8.
  2. J. Choi, "Speech and Noise Recognition System by Neural Network," The J. of Korea Institute of Electronic Communication Science, vol. 5, no. 4, 2010, pp. 357-362.
  3. J. Choi, "Subband Based Spectrum Subtraction Algorithm" The J. of Korea Institute of Electronic Communication Science, vol. 8, no. 4, 2013, pp. 555-560. https://doi.org/10.13067/JKIECS.2013.8.4.555
  4. J. Choi, "Voiced-Unvoiced-Silence Detection Algorithm using Perceptron Neural Network," The J. of Korea Institute of Electronic Communication Science, vol. 6, no 2, 2011, pp. 237-242.
  5. C. Lee and D. Kim, "Adaptive Noise Reduction of Speech Using Wavelet Transform," The J. of Korea Institute of Electronic Communication Science, vol. 4, no. 3, 2009, pp. 190-196.
  6. J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical Voice Detection using a Multiple Observation Likelihood Ratio Test," IEEE Signal Proc. Letters, vol. 12, no. 10, 2005, pp. 689-692. https://doi.org/10.1109/LSP.2005.855551
  7. J. Sohn, N.-S. Kim, and W. Sung, "A statistical model-based voice activity detection[J]," Signal Proc. Letters, IEEE, vol. 6, no. 1, 1999, pp. 1-3.
  8. H. Veisi and H. Sameti, "Hidden Markov Model-based Voice Activity Detector with High Speech Detection Rate for Speech Enhancement," IET Signal Proc., vol. 6, no. 3, 2010, pp. 54-63.
  9. H. Othman and T. Aboulnasr, "A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector," EURASIP J. on Audio, Speech, and Music Proc., vol. 2007, 2007, pp. 1-7.
  10. X. Liu, Y. Liang, Y. Lou, H. Li, and B. Shan, "Noise-Robust Voice Activity Detector Based on Hidden Semi-Markov Models," Int. Conf. on Pattern Recognition, Istanbul, Turkey, August 2010, pp. 81-84.
  11. A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, and J. P. Petit, "ITU-T Recommendation G.729-Annex B. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70," IEEE Communication Mag., Sept. 1997, pp. 64-70.
  12. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker Verification using Adapted Gaussian Mixture Models," Digital Signal Processing, vol. 10, 2000, pp. 19-41. https://doi.org/10.1006/dspr.1999.0361
  13. S. Chen, R. C. Guido, T. Truong, and Y. Chang, "Improbed Voice Activity Detection Algorithm using Wavelet and Support Vector Machine," Computer Speech and Language, vol. 24, no. 3, 2010, pp. 531-543. https://doi.org/10.1016/j.csl.2009.06.002
  14. P. Tiawongsombat, M. Jeong, J. Yun, B. You, and S. Oh, "Robust visual speakingness detection using bi-level HMM," Pattern Recognition, vol. 45, no. 2, 2012, pp. 783-793. https://doi.org/10.1016/j.patcog.2011.07.011
  15. S. Skorik and F. Berthommier, "On a cepstrum-based speech detector robust to white noise," Computing Research Repository, vol. cs.CL/00100014, 2000, pp. 1-4.