References
- Kinnunen, T. & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Commun, Vol. 52, No. 1, 12-40. https://doi.org/10.1016/j.specom.2009.08.009
- Reynolds, D., Quatieri, T., Dunn, R. (2000). Speaker verification using adapted gaussian mixture models. Digital Signal Process, Vol. 10, No. 1, 19-41. https://doi.org/10.1006/dspr.1999.0361
- Campbell, W., Campbell, J., Reynolds, D., Singer, E., Torres-Carrasquillo, P. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, Vol. 20, No. 2-3, 210-229. https://doi.org/10.1016/j.csl.2005.06.003
- Kenny, P. (2006). Joint factor analysis of speaker and session variability: Theory and algorithms. http://www.crim.ca/perso/patrick.kenny/
- Senoussaoui, M., Kenny, P., Dehak, N., Dumouchel, P. (2010). An i-vector extractor suitable for speaker recognition with both microphone and telephone speech. Proc. Odyssey Speaker and Language Recognition Workshop, 28-33.
- Davis, S., Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech Signal Process, Vol. 28, No. 4, 357-366. https://doi.org/10.1109/TASSP.1980.1163420
- Zhou, X., Garcia-Romero, D., Duraiswami, R., Espy-Wilson, C., Shamma, S. (2011). Linear versus mel frequency cepstral coefficients for speaker recognition. Proc. ASRU Workshop, 559-564.
- Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoustics, Speech Signal Process, Vol. 29, No. 2, 254-272. https://doi.org/10.1109/TASSP.1981.1163530
- Kinnunen, T., Koh, C., Wang, L., Li, H., Chng, E. (2006). Temporal discrete cosine transform: Towards longer term temporal features for speaker verification. Proc. ISCSLP, 547-558.
- Milner, B. P., Vaseghi, S. V. (1995). An analysis of cepstral-time feature matrices for noise and channel robust speech recognition. Proc. Eurospeech, 519-522.
- Stevens, S., Volkman, J., Newman, E. B. (1937). A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, Vol. 8, No. 3, 185-190. https://doi.org/10.1121/1.1915893
- Wolfel, M., McDonough, J., Waibel, A. (2003). Warping and scaling of the minimum variance distortionless response. Proc. ASRU Workshop, 387-392.
- Choi, Y. H., Ban, S. M., Lee, G. H., Kim, K. H. Kim, H. S. (2014). Performance comparison of different frequency scales in feature extraction for speaker recognition. Proceedings of 2014 Fall Conference of Korean Society of Speech Sciences, 195-196. (최영호, 반성민, 이가희, 김경화, 김형순 (2014). 화자인식 특징추출을 위한 주파수 스케일 성능 비교. 2014 한국음성학회 가을 학술대회 발표 논문집, 195-196.)
- Kumar, P., Rao, P. (2004). A study of frequency-scale warping for speaker recognition. Proc. NCC 2004, 203-207.
- Zhang, W. Q., Deng, Y., He, L., Liu, J. (2010). Variant time-frequency cepstral features for speaker recognition. Proc. Interspeech, 2122-2125.
- Larcher, A., Bonastre, J. F., Fauve, B., Lee, K. A., Levy, C., Li, H., Mason, J. S., Parfait, J. Y. (2013). ALIZE 3.0 - open source toolkit for state-of-the-art speaker recognition. Proc. Interspeech, 2768-2773.
- The evaluation plan of NIST 2004 speaker recognition evaluation campaign. http://www.itl.nist.gov/iad/mig/tests/spk/2004/SRE-04_evalplan-v1a.pdf.
- Brandschain, L., Graff, D., Cieri, C., Walker, K., Caruso, C., Neely, A. (2010). The mixer 6 corpus: Resources for cross-channel and text independent speaker recognition. Proc. LREC 2010, 2441-2444.