Combination of Classifiers Decisions for Multilingual Speaker Identification

Nagaraja, B.G.;Jayanna, H.S.;

doi:10.3745/JIPS.02.0025

Journal of Information Processing Systems

Volume 13 Issue 4
/
Pages.928-940
/
2017
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Combination of Classifiers Decisions for Multilingual Speaker Identification

Nagaraja, B.G. (Dept. of E&CE, Jain Institute of Technology) ;
Jayanna, H.S. (Dept. of IS&E, Siddaganga Institute of Technology)

Received : 2014.01.27
Accepted : 2014.08.05
Published : 2017.08.31

https://doi.org/10.3745/JIPS.02.0025 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.

Keywords

References

B. S. Atal, "Automatic recognition of speakers from their voices," Proceedings of the IEEE, vol. 64, no. 4, pp. 460-475, 1976. https://doi.org/10.1109/PROC.1976.10155
D. Reynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, 1995. https://doi.org/10.1109/89.365379
P. H. Arjun, "Speaker recognition in indian languages: a feature based approach," Ph.D. dissertation, Indian Institute of Technology Kharagpur, India, 2005.
M. Akbacak and J. H. Hansen, "Language normalization for bilingual speaker recognition systems," in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu, HI, 2007, pp. 257-260.
G. R. Doddington, M. A. Przybocki, A. F. Martin, and D. A. Reynolds, "The NIST speaker recognition evaluation-overview, methodology, systems, results, perspective," Speech Communication, vol. 31, no. 2, pp. 225-254, 2000. https://doi.org/10.1016/S0167-6393(99)00080-1
D. J. Mashao and M. Skosan, "Combining classifier decisions for robust speaker identification," Pattern Recognition, vol. 39, no. 1, pp. 147-155, 2006. https://doi.org/10.1016/j.patcog.2005.08.004
E. Kim, W. Kim, and Y. Lee, "Combination of multiple classifiers for the customer's purchase behavior prediction," Decision Support Systems, vol. 34, no. 2, pp. 167-175, 2003. https://doi.org/10.1016/S0167-9236(02)00079-9
T. K. Ho, J. J. Hull, and S. N. Srihari, "Decision combination in multiple classifier systems," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, 1994. https://doi.org/10.1109/34.273716
C. C. T. Chen, C. T. Chen, and C. K. Hou, "Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach," Pattern Recognition, vol. 37, no. 5, pp. 1073-1075, 2004. https://doi.org/10.1016/j.patcog.2003.08.013
J. Kittler, M. Hatef, R. P. Duin, and J. Matas, "On combining classifiers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, 1998. https://doi.org/10.1109/34.667881
H. He and Y. Cao, "SSC: a classifier combination method based on signal strength," IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 7, pp. 1100-1117, 2012. https://doi.org/10.1109/TNNLS.2012.2198227
S. Z. Boujelbene, D. Ben AyedMezghani, and N. Ellouze, "Application of combining classifiers for textindependent speaker identification," in Proceedings of the 16th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2009), Yasmine Hammamet, 2009, pp. 723-726.
V. Hautamaki, T. Kinnunen, F. Sedlák, K. A. Lee, B. Ma, and H. Li, "Sparse classifier fusion for speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 8, pp. 1622-1631, 2013. https://doi.org/10.1109/TASL.2013.2256895
B. G. Nagaraja and H. S. Jayanna, "Multilingual speaker identification by combining evidence from LPR and multitaper MFCC," Journal of Intelligent Systems, vol. 22, no. 3, pp. 241-251, 2013.
The NIST Year 2003 speaker recognition evaluation plan [Online]. Available: http://www.itl.nist.gov/iad/mig/ tests/sre/2003/2003-spkrec-evalplan-v2.2.pdf.
T. Kinnunen, R. Saeidi, F. Sedlak, K. A. Lee, J. Sandberg, M. Hansson-Sandsten, and H. Li, "Low-variance multitaper MFCC features: a case study in robust speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, pp. 1990-2001, 2012. https://doi.org/10.1109/TASL.2012.2191960
J. R. Deller, J. H L Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals. New York, NY: Institute of Electrical and Electronics Engineers, 1993.
K. S. Riedel and A. Sidorenko, "Minimum bias multiple taper spectral estimation," IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 188-195, 1995. https://doi.org/10.1109/78.365298
D. J. Thomson, "Spectrum estimation and harmonic analysis," Proceedings of the IEEE, vol. 70, no. 9, pp. 1055-1096, 1982. https://doi.org/10.1109/PROC.1982.12433
M. Hansson and G. Salomonsson, "A multiple window method for estimation of peaked spectra," IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 778-781, 1997. https://doi.org/10.1109/78.558503
D. A. Reynolds, "Universal background models," in Encyclopedia of Biometrics. Heidelberg: Springer, 2009, pp. 1349-1352.
H. S. Jayanna, "Limited data speaker recognition," Ph.D. dissertation, Indian Institute of Technology Guwahati, India, 2009.
J. P. Campbell Jr, "Testing with the YOHO CD-ROM voice verification corpus," in Proceedings of 1995 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'95), Detroit, MI, 1995, pp. 341-344.
T. E. F. Filho, R. O. Messina, and E. F. Cabral Jr, "Learning vector quantization in text-independent automatic speaker recognition," in Proceedings of Vth Brazilian Symposium on Neural Networks, Belo Horizonte, Brazil, 1998, pp. 135-139.
S. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. New York, NY: Prentice-Hall, 1999.
G. Durou, "Multilingual text-independent speaker identification," in Proceedings of the Multi-Lingual Interoperability in Speech Technology (MIST) Workshop, Leusden, The Netherlands, 1999.

Journal of Information Processing Systems

Combination of Classifiers Decisions for Multilingual Speaker Identification

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)