- Volume 12 Issue 2
In this paper, we develop a speaker gender classification technique using collaborative sensor fusion for use in a wireless sensor network. The distributed sensor nodes remove the unwanted input data using the BER(Band Energy Ration) based voice activity detection, process only the relevant data, and transmit the hard labeled decisions to the fusion center where a global decision fusion is carried out. This takes advantages of power consumption and network resource management. The Bayesian sensor fusion and the global weighting decision fusion methods are proposed to achieve the gender classification. As the number of the sensor nodes varies, the Bayesian sensor fusion yields the best classification accuracy using the optimal operating points of the ROC(Receiver Operating Characteristic) curves_ For the weights used in the global decision fusion, the BER and MCL(Mutual Confidence Level) are employed to effectively combined at the fusion center. The simulation results show that as the number of the sensor nodes increases, the classification accuracy was even more improved in the low SNR(Signal to Noise Ration) condition.
- 노광현, 이병복, 박애순, "유비쿼터스 흡네트워크를 위한 LoanRF 디바이스 기반의 센서 네트워크 설계 및 응용 한국신호처리시스템논문지, 제 7권, 제 3호, pp. 87- 94, 7. 2006.
- G. Wichern, H. Thornburg, and A Spanias, "Multi-channel audio segmentation for continuous observation and archival of large spaces," Proc. of ICASSP'09, pp. 237-240, 2009.
- A Arora, P. Dutta, S. Bapat, V. Kulathumani, H. Zhang, V. Naik, V. Mittal, H. Cao, M. Demirbas, M. Gouda, y. Choi, T. Herman, S. Kulkarni, U. Arumugam, M. Nesterenko, A. Vora, and M. Miyashita, "A line in the sand: A wireless sensor network for target detection, classification, and tracking," Computer Networks, Vol. 46, no. 5, pp. 605-634, Dec. 2004. https://doi.org/10.1016/j.comnet.2004.06.007
- AS. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, MIT press, 1994.
- V. Berisha, H. Kwon, and A Spanias, "Real-time acoustic sensing using wireless sensor motes", Proc. of ISCAS'06, pp. 847-850, 2006.
- L. Lu, H.- J Zhang, H. Jiang, "Content analysis for audio classification and segmentation," IEEE Trans. on Speech and Audio Processing, Vol. 10, pp. 504-516, 2002. https://doi.org/10.1109/TSA.2002.804546
- D. Hall and S. McMullen, Mathematical Techniques in Multisensor Data Fusion, Artech House, 2004.
- P. K. Varshney, Distributed Detection and Data Fusion, Springer-Verlag, New York, 1997.
- A Benyassine, E. Shlornot, and H. Su, "ITU-T recommendation G.729, annex B, a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data spplications, IEEE Communication Magazine, pp. 64 - 72, 1997.
- S.B. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences", IEEE Trans. on Acoust., Speech and Signal Processing, Vol. 28, pp. 357 - 366, 1980. https://doi.org/10.1109/TASSP.1980.1163420
- J A. Bilmes, "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," Technical Report, University of Berkeley, ICSI -TR-97-021, 1997.
- H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I, Wiley Interscience, 2001.
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "The DARPA TIMIT acoustic-phonetic continuous speech corpus CD ROM," Tech. Rep. NISTIR 4930 / NTIS Order No. PB93-173938, National Institute of Standards and Technology, Gaithersburgh, Md, USA, Feb. 1993.