• Title/Summary/Keyword: Mel frequency cepstral coefficient

Search Result 66, Processing Time 0.025 seconds

Performance comparison of wake-up-word detection on mobile devices using various convolutional neural networks (다양한 합성곱 신경망 방식을 이용한 모바일 기기를 위한 시작 단어 검출의 성능 비교)

  • Kim, Sanghong;Lee, Bowon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.454-460
    • /
    • 2020
  • Artificial intelligence assistants that provide speech recognition operate through cloud-based voice recognition with high accuracy. In cloud-based speech recognition, Wake-Up-Word (WUW) detection plays an important role in activating devices on standby. In this paper, we compare the performance of Convolutional Neural Network (CNN)-based WUW detection models for mobile devices by using Google's speech commands dataset, using the spectrogram and mel-frequency cepstral coefficient features as inputs. The CNN models used in this paper are multi-layer perceptron, general convolutional neural network, VGG16, VGG19, ResNet50, ResNet101, ResNet152, MobileNet. We also propose network that reduces the model size to 1/25 while maintaining the performance of MobileNet is also proposed.

A Study for Complexity Improvement of Automatic Speaker Verification in PDA Environment (PDA 환경에서 자동화자 확인의 계산량 개선을 위한 연구)

  • Seo, Chang-Woo;Lim, Young-Hwan;Jeon, Sung-Chae;Jang, Nam-Young
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.10 no.3
    • /
    • pp.170-175
    • /
    • 2009
  • In this paper, we propose real time automatic speaker verification (ASV) system to protect personal information on personal digital assistant (PDA) device. Recently, the capacity of PDA has extended and been popular, especially for mobile environment such as mobile commerce (M-commerce). However, there still exist lots of difficulties for practical application of ASV utility to PDA device because it requires too much computational complexity. To solve this problem, we apply the method to relieve the computational burden by performing the preprocessing such as spectral subtraction and speech detection during the speech utterance. Also by applying the hidden Markov model (HMM) optimal state alignment and the sequential probability ratio test (SPRT), we can get much faster processing results. The whole system implementation is simple and compact enough to fit well with PDA device's limited memory and low CPU speed.

  • PDF

Application of Machine Learning on Voice Signals to Classify Body Mass Index - Based on Korean Adults in the Korean Medicine Data Center (머신러닝 기반 음성분석을 통한 체질량지수 분류 예측 - 한국 성인을 중심으로)

  • Kim, Junho;Park, Ki-Hyun;Kim, Ho-Seok;Lee, Siwoo;Kim, Sang-Hyuk
    • Journal of Sasang Constitutional Medicine
    • /
    • v.33 no.4
    • /
    • pp.1-9
    • /
    • 2021
  • Objectives The purpose of this study was to check whether the classification of the individual's Body Mass Index (BMI) could be predicted by analyzing the voice data constructed at the Korean medicine data center (KDC) using machine learning. Methods In this study, we proposed a convolutional neural network (CNN)-based BMI classification model. The subjects of this study were Korean adults who had completed voice recording and BMI measurement in 2006-2015 among the data established at the Korean Medicine Data Center. Among them, 2,825 data were used for training to build the model, and 566 data were used to assess the performance of the model. As an input feature of CNN, Mel-frequency cepstral coefficient (MFCC) extracted from vowel utterances was used. A model was constructed to predict a total of four groups according to gender and BMI criteria: overweight male, normal male, overweight female, and normal female. Results & Conclusions Performance evaluation was conducted using F1-score and Accuracy. As a result of the prediction for four groups, The average accuracy was 0.6016, and the average F1-score was 0.5922. Although it showed good performance in gender discrimination, it is judged that performance improvement through follow-up studies is necessary for distinguishing BMI within gender. As research on deep learning is active, performance improvement is expected through future research.

RoutingConvNet: A Light-weight Speech Emotion Recognition Model Based on Bidirectional MFCC (RoutingConvNet: 양방향 MFCC 기반 경량 음성감정인식 모델)

  • Hyun Taek Lim;Soo Hyung Kim;Guee Sang Lee;Hyung Jeong Yang
    • Smart Media Journal
    • /
    • v.12 no.5
    • /
    • pp.28-35
    • /
    • 2023
  • In this study, we propose a new light-weight model RoutingConvNet with fewer parameters to improve the applicability and practicality of speech emotion recognition. To reduce the number of learnable parameters, the proposed model connects bidirectional MFCCs on a channel-by-channel basis to learn long-term emotion dependence and extract contextual features. A light-weight deep CNN is constructed for low-level feature extraction, and self-attention is used to obtain information about channel and spatial signals in speech signals. In addition, we apply dynamic routing to improve the accuracy and construct a model that is robust to feature variations. The proposed model shows parameter reduction and accuracy improvement in the overall experiments of speech emotion datasets (EMO-DB, RAVDESS, and IEMOCAP), achieving 87.86%, 83.44%, and 66.06% accuracy respectively with about 156,000 parameters. In this study, we proposed a metric to calculate the trade-off between the number of parameters and accuracy for performance evaluation against light-weight.

An Implementation of Automatic Genre Classification System for Korean Traditional Music (한국 전통음악 (국악)에 대한 자동 장르 분류 시스템 구현)

  • Lee Kang-Kyu;Yoon Won-Jung;Park Kyu-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.29-37
    • /
    • 2005
  • This paper proposes an automatic genre classification system for Korean traditional music. The Proposed system accepts and classifies queried input music as one of the six musical genres such as Royal Shrine Music, Classcal Chamber Music, Folk Song, Folk Music, Buddhist Music, Shamanist Music based on music contents. In general, content-based music genre classification consists of two stages - music feature vector extraction and Pattern classification. For feature extraction. the system extracts 58 dimensional feature vectors including spectral centroid, spectral rolloff and spectral flux based on STFT and also the coefficient domain features such as LPC, MFCC, and then these features are further optimized using SFS method. For Pattern or genre classification, k-NN, Gaussian, GMM and SVM algorithms are considered. In addition, the proposed system adopts MFC method to settle down the uncertainty problem of the system performance due to the different query Patterns (or portions). From the experimental results. we verify the successful genre classification performance over $97{\%}$ for both the k-NN and SVM classifier, however SVM classifier provides almost three times faster classification performance than the k-NN.

Motion Study of Treatment Robot for Autistic Children Using Speech Data Classification Based on Artificial Neural Network (음성 분류 인공신경망을 활용한 자폐아 치료용 로봇의 지능화 동작 연구)

  • Lee, Jin-Gyu;Lee, Bo-Hee
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1440-1447
    • /
    • 2019
  • Currently, the prevalence of autism spectrum disorders in children is reported to be higher and shows various types of disorders. In particular, they are having difficulty in communication due to communication impairment in the area of social communication and need to be improved through training. Thus, this study proposes a method of acquiring voice information through a microphone mounted on a robot designed through preliminary research and using this information to make intelligent motions. An ANN(Artificial Neural Network) was used to classify the speech data into robot motions, and we tried to improve the accuracy by combining the Recurrent Neural Network based on Convolutional Neural Network. The preprocessing of input speech data was analyzed using MFCC(Mel-Frequency Cepstral Coefficient), and the motion of the robot was estimated using various data normalization and neural network optimization techniques. In addition, the designed ANN showed a high accuracy by conducting an experiment comparing the accuracy with the existing architecture and the method of human intervention. In order to design robot motions with higher accuracy in the future and to apply them in the treatment and education environment of children with autism.