• Title/Summary/Keyword: Voice Features Extraction

Search Result 22, Processing Time 0.02 seconds

Proposed Efficient Architectures and Design Choices in SoPC System for Speech Recognition

  • Trang, Hoang;Hoang, Tran Van
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.241-247
    • /
    • 2013
  • This paper presents the design of a System on Programmable Chip (SoPC) based on Field Programmable Gate Array (FPGA) for speech recognition in which Mel-Frequency Cepstral Coefficients (MFCC) for speech feature extraction and Vector Quantization for recognition are used. The implementing process of the speech recognition system undergoes the following steps: feature extraction, training codebook, recognition. In the first step of feature extraction, the input voice data will be transformed into spectral components and extracted to get the main features by using MFCC algorithm. In the recognition step, the obtained spectral features from the first step will be processed and compared with the trained components. The Vector Quantization (VQ) is applied in this step. In our experiment, Altera's DE2 board with Cyclone II FPGA is used to implement the recognition system which can recognize 64 words. The execution speed of the blocks in the speech recognition system is surveyed by calculating the number of clock cycles while executing each block. The recognition accuracies are also measured in different parameters of the system. These results in execution speed and recognition accuracy could help the designer to choose the best configurations in speech recognition on SoPC.

A Study on the Acoustic Characteristics of the Pansori by Voice Signals Analysis (음성신호 분석에 의한 판소리의 음성학적 특징 연구)

  • Kim, HyunSook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3218-3222
    • /
    • 2013
  • Pansori is our traditional vocal sound, originality and excellence in the art of conversation, gesture general became a globally recognized world intangible heritage. Especially, Pansori as shrews and humorous representation of audience participation with a high degree of artistic value and enjoy the arts throughout all layers to be responsible for the social integration of functions is evaluated. Therefore, in this paper, Pansori five yard target speech signal analysis techniques applied to analyze the Pansori acoustic features of a representation of a society and era correlation extraction studies were performed. Pansori on the five yard spectrogram, pitch, stability and strength analysis for this experiment. Pansori through experimental results Comical story while keeping the audience focused and interested to better reflect the characteristics of energy for the wave of voice and vocal cord tremor change the width of a large, stable and voice with a loud voice, that expresses were analyzed.

Modified Mel Frequency Cepstral Coefficient for Korean Children's Speech Recognition (한국어 유아 음성인식을 위한 수정된 Mel 주파수 캡스트럼)

  • Yoo, Jae-Kwon;Lee, Kyoung-Mi
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.3
    • /
    • pp.1-8
    • /
    • 2013
  • This paper proposes a new feature extraction algorithm to improve children's speech recognition in Korean. The proposed feature extraction algorithm combines three methods. The first method is on the vocal tract length normalization to compensate acoustic features because the vocal tract length in children is shorter than in adults. The second method is to use the uniform bandwidth because children's voice is centered on high spectral regions. Finally, the proposed algorithm uses a smoothing filter for a robust speech recognizer in real environments. This paper shows the new feature extraction algorithm improves the children's speech recognition performance.

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

A Method of Predicting Service Time Based on Voice of Customer Data (고객의 소리(VOC) 데이터를 활용한 서비스 처리 시간 예측방법)

  • Kim, Jeonghun;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.197-210
    • /
    • 2016
  • With the advent of text analytics, VOC (Voice of Customer) data become an important resource which provides the managers and marketing practitioners with consumer's veiled opinion and requirements. In other words, making relevant use of VOC data potentially improves the customer responsiveness and satisfaction, each of which eventually improves business performance. However, unstructured data set such as customers' complaints in VOC data have seldom used in marketing practices such as predicting service time as an index of service quality. Because the VOC data which contains unstructured data is too complicated form. Also that needs convert unstructured data from structure data which difficult process. Hence, this study aims to propose a prediction model to improve the estimation accuracy of the level of customer satisfaction by combining unstructured from textmining with structured data features in VOC. Also the relationship between the unstructured, structured data and service processing time through the regression analysis. Text mining techniques, sentiment analysis, keyword extraction, classification algorithms, decision tree and multiple regression are considered and compared. For the experiment, we used actual VOC data in a company.

Noise Elimination Using Improved MFCC and Gaussian Noise Deviation Estimation

  • Sang-Yeob, Oh
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.1
    • /
    • pp.87-92
    • /
    • 2023
  • With the continuous development of the speech recognition system, the recognition rate for speech has developed rapidly, but it has a disadvantage in that it cannot accurately recognize the voice due to the noise generated by mixing various voices with the noise in the use environment. In order to increase the vocabulary recognition rate when processing speech with environmental noise, noise must be removed. Even in the existing HMM, CHMM, GMM, and DNN applied with AI models, unexpected noise occurs or quantization noise is basically added to the digital signal. When this happens, the source signal is altered or corrupted, which lowers the recognition rate. To solve this problem, each voice In order to efficiently extract the features of the speech signal for the frame, the MFCC was improved and processed. To remove the noise from the speech signal, the noise removal method using the Gaussian model applied noise deviation estimation was improved and applied. The performance evaluation of the proposed model was processed using a cross-correlation coefficient to evaluate the accuracy of speech. As a result of evaluating the recognition rate of the proposed method, it was confirmed that the difference in the average value of the correlation coefficient was improved by 0.53 dB.

Visual Features and Shape Extraction of Voice Analysis Elements for Heart Diseases Diagnosis (심장 질환 진단을 위한 음성분석학적 요소의 시각 특징 및 형태 추출)

  • Kim, Bong-Hyun;Lee, Se-Hwan;Park, Sun-Ae;Ka, Min-Kyoung;Oh, Won-Geun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.405-408
    • /
    • 2007
  • 건강관리 및 유지에 대한 현대인들의 관심이 증대되면서 삶의 질 향상을 추구하는 고령화 사회에서 성인병 및 만성질환은 매우 위험한 요인이 되고 있는 실정이다. 특히 심장 질환은 3대 사망률 중 한 부분을 차지하고 있을 정도로 위협적이며 비전염성 만성질환 중 하나이다. 그러나 모든 질환에 대한 대처 방법이 동일하듯이 조기 진단에 의한 질환 예방이 무엇보다 중요하다. 따라서 본 논문에서는 심장 질환자의 음성 신호를 획득하여 다양한 음성분석학적 요소 추출 및 분석을 통해 심장 질환과의 연관성을 파악하고자 한다. 이를 위해 본 논문에서는 기존의 음성 분석 요소에 대한 1차 실험을 검증하고 추가 음성 분석 요소들에 대한 2차 실험을 행하여 각각의 분석 요소들과 음성에 대한 형태학적 특징을 시각화하여 편리하게 심장 질환을 진단하는 기법들을 제시하고자 한다.

  • PDF

Prediction of Closed Quotient During Vocal Phonation using GRU-type Neural Network with Audio Signals

  • Hyeonbin Han;Keun Young Lee;Seong-Yoon Shin;Yoseup Kim;Gwanghyun Jo;Jihoon Park;Young-Min Kim
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2024
  • Closed quotient (CQ) represents the time ratio for which the vocal folds remain in contact during voice production. Because analyzing CQ values serves as an important reference point in vocal training for professional singers, these values have been measured mechanically or electrically by either inverse filtering of airflows captured by a circumferentially vented mask or post-processing of electroglottography waveforms. In this study, we introduced a novel algorithm to predict the CQ values only from audio signals. This has eliminated the need for mechanical or electrical measurement techniques. Our algorithm is based on a gated recurrent unit (GRU)-type neural network. To enhance the efficiency, we pre-processed an audio signal using the pitch feature extraction algorithm. Then, GRU-type neural networks were employed to extract the features. This was followed by a dense layer for the final prediction. The Results section reports the mean square error between the predicted and real CQ. It shows the capability of the proposed algorithm to predict CQ values.

Hand Biometric Information Recognition System of Mobile Phone Image for Mobile Security (모바일 보안을 위한 모바일 폰 영상의 손 생체 정보 인식 시스템)

  • Hong, Kyungho;Jung, Eunhwa
    • Journal of Digital Convergence
    • /
    • v.12 no.4
    • /
    • pp.319-326
    • /
    • 2014
  • According to the increasing mobile security users who have experienced authentication failure by forgetting passwords, user names, or a response to a knowledge-based question have preference for biological information such as hand geometry, fingerprints, voice in personal identification and authentication. Therefore biometric verification of personal identification and authentication for mobile security provides assurance to both the customer and the seller in the internet. Our study focuses on human hand biometric information recognition system for personal identification and personal Authentication, including its shape, palm features and the lengths and widths of the fingers taken from mobile phone photographs such as iPhone4 and galaxy s2. Our hand biometric information recognition system consists of six steps processing: image acquisition, preprocessing, removing noises, extracting standard hand feature extraction, individual feature pattern extraction, hand biometric information recognition for personal identification and authentication from input images. The validity of the proposed system from mobile phone image is demonstrated through 93.5% of the sucessful recognition rate for 250 experimental data of hand shape images and palm information images from 50 subjects.

A Design for Mobile Contents Converting Using XML Parser Extraction (XML Parser추출에 의한 모바일 컨텐츠 변환 설계)

  • 김영선;장덕철
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.2
    • /
    • pp.267-275
    • /
    • 2003
  • The development of the mobile communication has been enlarged gradually and has been changed voice centered services into data centered ones supporting Internet to provide various kinds of mobile internet services. Researches for data transmission have been achieved actively lot the effective approachment of web in the mobile terminal. The users using mobile communication are increasing for its easy handcarried and movement and the information in internet based upon wire communication has been changed into mobile communication fast. The development of mobile internet requires various mobile contents using mobile internet at any time and at any place due to rapid enhance of network capacity and mobile internet according to network expansion. To accept these needs, there are many difficult problems that contents must be remade according to various terminal features and the develoument costs much. To solve these problems, this paper has the pun)me that contents of Web documents are changed and processed through XML palter and design new mobile contents system, providing mobile services rapidly and obtaining the charge reduction result with the enhance of contents development to the shortening of a development Period.

  • PDF