DOI QR코드

DOI QR Code

CNN을 이용한 음성 데이터 성별 및 연령 분류 기술 연구

A Study on the Gender and Age Classification of Speech Data Using CNN

  • 박대서 (강원대학교 컴퓨터정보통신공학과) ;
  • 방준일 (강원대학교 컴퓨터정보통신공학과) ;
  • 김화종 (강원대학교 컴퓨터정보통신공학과) ;
  • 고영준 ((주) 모다정보통신 IoT 개발단)
  • 투고 : 2018.09.07
  • 심사 : 2018.10.08
  • 발행 : 2018.11.30

초록

본 논문에서는 사람을 대신하여 분류, 예측 하는 딥러닝 기술을 활용하여 목소리를 통해 남녀노소를 분류하는 연구를 수행한다. 연구과정은 기존 신경망 기반의 사운드 분류 연구를 살펴보고 목소리 분류를 위한 개선된 신경망을 제안한다. 기존 연구에서는 도시 데이터를 이용해 사운드를 분류하는 연구를 진행하였으나, 얕은 신경망으로 인한 성능 저하가 나타났으며 다른 소리 데이터에 대해서도 좋은 성능을 보이지 못했다. 이에 본 논문에서는 목소리 데이터를 전처리하여 특징값을 추출한 뒤 추출된 특징값을 기존 사운드 분류 신경망과 제안하는 신경망에 입력하여 목소리를 분류하고 두 신경망의 분류 성능을 비교 평가한다. 본 논문의 신경망은 망을 더 깊고 넓게 구성함으로써 보다 개선된 딥러닝 학습이 이루어지도록 하였다. 성능 결과로는 기존 연구와 본 연구의 신경망에서 각각 84.8%, 91.4%로 제안하는 신경망에서 약 6% 더 높은 정확도를 보였다.

Research is carried out to categorize voices using Deep Learning technology. The study examines neural network-based sound classification studies and suggests improved neural networks for voice classification. Related studies studied urban data classification. However, related studies showed poor performance in shallow neural network. Therefore, in this paper the first preprocess voice data and extract feature value. Next, Categorize the voice by entering the feature value into previous sound classification network and proposed neural network. Finally, compare and evaluate classification performance of the two neural networks. The neural network of this paper is organized deeper and wider so that learning is better done. Performance results showed that 84.8 percent of related studies neural networks and 91.4 percent of the proposed neural networks. The proposed neural network was about 6 percent high.

키워드

과제정보

연구 과제번호 : 빅데이터 자동 태깅 및 태그 기반 DaaS 시스템 개발, 지능형 디바이스 및 게이트웨이/허브를 위한 사용자 친화적 지능형 UI/UX 기술 구현

연구 과제 주관 기관 : 정보통신기술진흥센터

참고문헌

  1. H. S, Park, S. W. Kim, M. H. Jin, and C. D. Yoo, "Voice recognition technology trends based on machine learning", The Magazine of the IEIE, Vol. 14, No. 3, pp. 18-27, Mar. 2014.
  2. Deep Learning-based Speech Recognition Technology, http://www.itdaily.kr/news/articleView.html?idxno=76405. [accessed: Aug. 28, 2018]
  3. Filipp Akopyan et all., "TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 34, No. 10, pp. 1537-1557, Aug. 2015. https://doi.org/10.1109/TCAD.2015.2474396
  4. A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A Generative Model for Raw Audio", arXiv:1609.03499v2, pp. 1-15, Sep. 2016.
  5. S. S. Jo and Y. G. Kim, "AI (Artificial Intelligence) Voice Assistant Evolving to Platform", IITP, pp. 1-25, Feb. 2017.
  6. mel-spectrogram, https://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html. [accessed: Aug. 28, 2018]
  7. M.D. Zeiler and R. Fergus, "Visualizing and Understanding Convolutional Networks", arXiv: 1311.2901, pp. 819-833, Nov. 2013.
  8. L. H. Meng and J. S. Han, "The Impact of Relational Benefits on Positive Affect, Perceived Value, and Behavior Intention in Social Commerce : Focused on Chinese Tourist having the Hotel Service of Social Commerce environment", Journal of tourism and leisure research, Vol. 29, No. 10, pp. 69-88, Oct. 2017.
  9. J. H. Seo and Y. T. Kim, "Effects of Service Convenience on Customer Satisfaction and Reuse Intention by Korail Talk App Users among Korail Passengers", Journal of the Korean Society for Railway, Vol. 16, No. 5, pp. 410-417, Oct. 2013. https://doi.org/10.7782/JKSR.2013.16.5.410
  10. H. Zhou, Y. Song, and H. Shu, "Using Deep Convolutional Neural Network to Classify Urban Sounds", TENCON 2017 - 2017 IEEE Region 10 Conference, pp. 3089-3092, Nov. 2017.
  11. Urban Sound DataSet, https://urbansounddataset.we ebly.com. [accessed: Aug. 28, 2018]
  12. V. Nair and G. E. Hinton, "Rectified Linear Units Improve Restricted Boltzmann Machines", In Proc. 27th International Conference on Machine Learning, pp. 807-814, Jun. 2010.
  13. J. Schmidhuber, "Deep learning in neural networks: An overview", Neural Networks, Vol. 61, pp. 85-117, Oct. 2014.
  14. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions", arXiv:1409.4842, pp. 1-12 Sep. 2014.
  15. M. S. Kim and T. W. Kang, "Proposal and Analysis of Various Link Architectures in Multilayer Neural Network", The Journal of Korean Institute of Information Technology, Vol. 16, No. 4, pp. 11-19, Apr. 2018. https://doi.org/10.14801/jkiit.2018.16.4.11
  16. J. S. Choi, "A Speech Feature Enhancement Technique by Cepstral Noise Model for Noisy Speaker Identification", The Journal of Korean Institute of Information Technology, Vol. 16, No. 3, pp. 11-16, Mar. 2018. https://doi.org/10.14801/jkiit.2018.16.3.11
  17. Keras, https://keras.io/. [accessed: Aug. 28, 2018]
  18. B. Xu, N. Wang, T. Chen, and M. Li, "Empirical Evaluation of Rectified Activations in Convolution Network", arXiv:1505.00853, Nov. 2015.
  19. R. Caruana, S. Lawrence, and L. Giles, "Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping", in Proc. Neural Information Processing Systems Conference, pp. 402-408, Jan. 2000.
  20. Dobbin KK and Simon RM, "Optimally splitting cases for training and testing high dimensional classifiers", BMC Med Genomics, Apr. 2011.
  21. M. Sokolovaa and G. Lapalme, "A systematic analysis of performance measures for classification tasks", Information Processing & Management, Vol. 45, No. 4, pp. 427-437, Jul. 2009. https://doi.org/10.1016/j.ipm.2009.03.002
  22. A. Sattar and B. H. Kang, "Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation", AI 2006: AI 2006: Advances in Artificial Intelligence, Springer, pp. 1015-1021, Dec. 2006.