DOI QR코드

DOI QR Code

Speaker Segmentation System Using Eigenvoice-based Speaker Weight Distance Method

Eigenvoice 기반 화자가중치 거리측정 방식을 이용한 화자 분할 시스템

  • Received : 2012.03.06
  • Accepted : 2012.04.23
  • Published : 2012.05.31

Abstract

Speaker segmentation is a process of automatically detecting the speaker boundary points in the audio data. Speaker segmentation methods are divided into two categories depending on whether they use a prior knowledge or not: One is the model-based segmentation and the other is the metric-based segmentation. In this paper, we introduce the eigenvoice-based speaker weight distance method and compare it with the representative metric-based methods. Also, we employ and compare the Euclidean and cosine similarity functions to calculate the distance between speaker weight vectors. And we verify that the speaker weight distance method is computationally very efficient compared with the method directly using the distance between the speaker adapted models constructed by the eigenvoice technique.

화자 분할 기술은 오디오 데이터로부터 자동적으로 화자 경계 구간을 검출하는 것이다. 화자 분할 방식은 화자에 대한 선행 지식 사용 여부에 따라 거리기반 방식과 모델기반 방식으로 나누어진다. 본 논문에서는 eigenvoice 기반의 화자가중치 거리를 이용한 화자 분할 방식을 도입하고, 이 방식을 대표적인 거리 기반 방식들과 비교한다. 또한, 화자가중치의 거리 측정 함수로 유클리드 거리와 cosine 유사도를 사용하여 화자 분할 성능을 비교하고, eigenvoice 방식에 의해 화자 적응된 모델들 사이의 직접적인 거리를 이용한 화자 분할 방식과의 비교를 통해 화자가중치 거리를 이용한 방식이 계산량면에서 효율적인 점을 검증한다.

Keywords

References

  1. S. E. Tranter and D. A. Reynolds, "An overview of automatic speaker diarization systems," IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1557-1565, 2006. https://doi.org/10.1109/TASL.2006.878256
  2. S. S. Chen, and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," in Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp. 127-132, 1998.
  3. M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, "Automatic segmentation, classification and clustering of broadcast news," in Proc. DARPA Speech Recognition Workshop, pp. 97-99, Feb.,1997.
  4. R. Kuhn, P. Nguyen, J. C. Jungua, L. Goldwasser, N. Niedzielski, S. Finche, K. Field and M. Contolini, "Eigenvoices for speaker adaptation," in Proc. of Int. Conf. on Spoken Language Processing, vol. 5, pp. 1771-1774, Nov. 1998.
  5. F. Castaldo, D. Colibro, E. Dalmasso, P. Laface, C. Vair, "Stream-based speaker segmentation using speaker factors and eigenvoices." in Proc. of Int. Conf. Acoustics, Speech, and Signal Processing, pp. 4133-4136, 2008.
  6. R. Kuhn, J. C. Junqua, P. Nguyen, N. Neidzielski, "Rapid speaker adaptation in eigenvoice space," IEEE Trans. Speech and Audio Proc., vol. 8, no. 6, pp. 695-707, 2000. https://doi.org/10.1109/89.876308
  7. J. S. Garofalo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, "The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus," CDROM, NIST, 1990.
  8. P. N. Belhumeur, J. P. Hespanha, D. J. Kriegman, "Eigenfaces vs. fisherfaces: recognition using class specific linear projection," IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 19, no. 7, pp. 711-720, 1997. https://doi.org/10.1109/34.598228
  9. M. Kotti, E. Benetos, C. Kotropoulos, "Computationally efficient and robust BIC-based speaker segmentation," IEEE Trans. Audio, Speech, and Language Proc., vol. 16, no. 5, pp. 920-933, 2008. https://doi.org/10.1109/TASL.2008.925152
  10. 최무열, 김형순, "Eigenvoice를 이용한 화자분할 시스템의 성능 비교", 한국음성학회 가을 학술대회 발표논문집, 87-88쪽, 2011.