An Efficient Voice Activity Detection Method using Bi-Level HMM

Jang, Guang-Woo;Jeong, Mun-Ho;

doi:10.13067/JKIECS.2015.10.8.901

The Journal of the Korea institute of electronic communication sciences (한국전자통신학회논문지)

Volume 10 Issue 8
/
Pages.901-906
/
2015
/
1975-8170(pISSN)

Korea Institute of Electronic Communication Science (한국전자통신학회)

DOI QR Code

An Efficient Voice Activity Detection Method using Bi-Level HMM

Bi-Level HMM을 이용한 효율적인 음성구간 검출 방법

장광우 (광운대학교 로봇학부) ;
정문호 (광운대학교 로봇학부)

Received : 2015.07.08
Accepted : 2015.08.23
Published : 2015.08.31

https://doi.org/10.13067/JKIECS.2015.10.8.901 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

We presented a method for Vad(Voice Activity Detection) using Bi-level HMM. Conventional methods need to do an additional post processing or set rule-based delayed frames. To cope with the problem, we applied to VAD a Bi-level HMM that has an inserted state layer into a typical HMM. And we used posterior ratio of voice states to detect voice period. Considering MFCCs(: Mel-Frequency Cepstral Coefficients) as observation vectors, we performed some experiments with voice data of different SNRs and achieved satisfactory results compared with well-known methods.

본 논문에서는 Bi-Level HMM을 이용한 음성구간 검출 방법을 제안하였다. 기존의 음성 구간 검출법은 짧은 상태변화 오류(Burst Clipping)를 제거하기 위하여 별도의 후처리 과정을 거치든가, 규칙 기반 지연 프레임을 설정해야만 한다. 이러한 문제에 대처하기 위하여 기존의 HMM 모델에 상태 계층을 추가한 Bi-Level HMM을 이용하여 음성구간 판정을 위해 음성상태의 사후 확률비를 이용하였다. 사람의 청각특성을 고려한 MFCC를 특징치로 하여, 다양한 SNR의 음성 데이터에 대한 평가지표를 활용한 실험을 수행하여 기존의 음성상태 분류법보다 우수한 결과를 얻을 수 있었다.

Keywords

References

Y. Zhang, Z. Tang, Y. Li, and Y. Luo, "A hierarchical framework approach for voice activity detection and speech enhancement," The Scientific World J., vol. 2014, 2014, pp. 1-8.
J. Choi, "Speech and Noise Recognition System by Neural Network," The J. of Korea Institute of Electronic Communication Science, vol. 5, no. 4, 2010, pp. 357-362.
J. Choi, "Subband Based Spectrum Subtraction Algorithm" The J. of Korea Institute of Electronic Communication Science, vol. 8, no. 4, 2013, pp. 555-560. https://doi.org/10.13067/JKIECS.2013.8.4.555
J. Choi, "Voiced-Unvoiced-Silence Detection Algorithm using Perceptron Neural Network," The J. of Korea Institute of Electronic Communication Science, vol. 6, no 2, 2011, pp. 237-242.
C. Lee and D. Kim, "Adaptive Noise Reduction of Speech Using Wavelet Transform," The J. of Korea Institute of Electronic Communication Science, vol. 4, no. 3, 2009, pp. 190-196.
J. Ramirez, J. C. Segura, C. Benitez, L. Garcia, and A. Rubio, "Statistical Voice Detection using a Multiple Observation Likelihood Ratio Test," IEEE Signal Proc. Letters, vol. 12, no. 10, 2005, pp. 689-692. https://doi.org/10.1109/LSP.2005.855551
J. Sohn, N.-S. Kim, and W. Sung, "A statistical model-based voice activity detection[J]," Signal Proc. Letters, IEEE, vol. 6, no. 1, 1999, pp. 1-3.
H. Veisi and H. Sameti, "Hidden Markov Model-based Voice Activity Detector with High Speech Detection Rate for Speech Enhancement," IET Signal Proc., vol. 6, no. 3, 2010, pp. 54-63.
H. Othman and T. Aboulnasr, "A Semi-Continuous State-Transition Probability HMM-Based Voice Activity Detector," EURASIP J. on Audio, Speech, and Music Proc., vol. 2007, 2007, pp. 1-7.
X. Liu, Y. Liang, Y. Lou, H. Li, and B. Shan, "Noise-Robust Voice Activity Detector Based on Hidden Semi-Markov Models," Int. Conf. on Pattern Recognition, Istanbul, Turkey, August 2010, pp. 81-84.
A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, and J. P. Petit, "ITU-T Recommendation G.729-Annex B. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70," IEEE Communication Mag., Sept. 1997, pp. 64-70.
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker Verification using Adapted Gaussian Mixture Models," Digital Signal Processing, vol. 10, 2000, pp. 19-41. https://doi.org/10.1006/dspr.1999.0361
S. Chen, R. C. Guido, T. Truong, and Y. Chang, "Improbed Voice Activity Detection Algorithm using Wavelet and Support Vector Machine," Computer Speech and Language, vol. 24, no. 3, 2010, pp. 531-543. https://doi.org/10.1016/j.csl.2009.06.002
P. Tiawongsombat, M. Jeong, J. Yun, B. You, and S. Oh, "Robust visual speakingness detection using bi-level HMM," Pattern Recognition, vol. 45, no. 2, 2012, pp. 783-793. https://doi.org/10.1016/j.patcog.2011.07.011
S. Skorik and F. Berthommier, "On a cepstrum-based speech detector robust to white noise," Computing Research Repository, vol. cs.CL/00100014, 2000, pp. 1-4.

The Journal of the Korea institute of electronic communication sciences (한국전자통신학회논문지)

An Efficient Voice Activity Detection Method using Bi-Level HMM

Bi-Level HMM을 이용한 효율적인 음성구간 검출 방법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)