Speech/Music Discrimination Using Mel-Cepstrum Modulation Energy

멜 켑스트럼 모듈레이션 에너지를 이용한 음성/음악 판별

  • 김봉완 (음성정보기술산업지원센터) ;
  • 최대림 (음성정보기술산업지원센터) ;
  • 이용주 (원광대 전기 전자 및 정보공학부, 음성정보기술산업지원센터)
  • Published : 2007.12.30


In this paper, we introduce mel-cepstrum modulation energy (MCME) for a feature to discriminate speech and music data. MCME is a mel-cepstrum domain extension of modulation energy (ME). MCME is extracted on the time trajectory of Mel-frequency cepstral coefficients, while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect the MCME to perform better than the ME. To find out the best modulation frequency for MCME, we perform experiments with 4 Hz to 20 Hz modulation frequency. To show effectiveness of the proposed feature, MCME, we compare the discrimination accuracy with the results obtained from the ME and the cepstral flux.
