• Title/Summary/Keyword: Voice recognition rate

Search Result 137, Processing Time 0.025 seconds

A Study on Individual Pitch Pulse using FIR-STREAK Filter in Speech Coding Method (음성부호화 방식에 있어서 FIR-STREAK 필터를 사용한 개별 피치펄스에 관한 연구)

  • Lee See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.4 no.4
    • /
    • pp.65-70
    • /
    • 2004
  • In this paper, I propose a new extraction method of Individual Pitch Pulse in order to accommodate the changes in each pitch interval and reduce pitch errors in Speech Coding. The extraction rate of individual pitch pulses was $96\%$ for male voice and $85\%$ for female voice respectively. This method has the capability of being applied to many fields, such as speech coding, speech analysis, speech synthesis and speech recognition.

  • PDF

On a Detection of the ZCR-Parameter for Higher Formants of Speech Signals (음성신호의 상위 포만트에 대한 ZCR-파라미터 검출에 관한 연구)

  • 유건수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.49-53
    • /
    • 1992
  • In many applications such as speech analysis, speech coding, speech recognition, etc., the voiced-unvoiced decision should be performed correctly for efficient processing. One of the parameters which are used for voice-unvoiced decision is zero-crossing. But the information of higher formants have not represented as the zero-crossing rate for higher formants of speech signals.

  • PDF

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語 數字音 認識에 관한 硏究)

  • Park, Hyun-Hwa;Gahang, Hae Dong;Bae, Keun Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.5-13
    • /
    • 1992
  • Taking devantage of the property that Korean digit is a mono-syllable word, we proposed a spoken Korean-digit recognition scheme using the multi-layer perceptron. The spoken Korean-digit is divided into three segments (initial sound, medial vowel, and final consonant) based on the voice starting / ending points and a peak point in the middle of vowel sound. The feature vectors such as cepstrum, reflection coefficients, ${\Delta}$cepstrum and ${\Delta}$energy are extracted from each segment. It has been shown that cepstrum, as an input vector to the neural network, gives higher recognition rate than reflection coefficients. Regression coefficients of cepstrum did not affect as much as we expected on the recognition rate. That is because, it is believed, we extracted features from the selected stationary segments of the input speech signal. With 150 ceptral coefficients obtained from each spoken digit, we achieved correct recognition rate of 97.8%.

  • PDF

Isolated Word Recognition Using k-clustering Subspace Method and Discriminant Common Vector (k-clustering 부공간 기법과 판별 공통벡터를 이용한 고립단어 인식)

  • Nam, Myung-Woo
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.42 no.1
    • /
    • pp.13-20
    • /
    • 2005
  • In this paper, I recognized Korean isolated words using CVEM which is suggested by M. Bilginer et al. CVEM is an algorithm which is easy to extract the common properties from training voice signals and also doesn't need complex calculation. In addition CVEM shows high accuracy in recognition results. But, CVEM has couple of problems which are impossible to use for many training voices and no discriminant information among extracted common vectors. To get the optimal common vectors from certain voice classes, various voices should be used for training. But CVEM is impossible to get continuous high accuracy in recognition because CVEM has a limitation to use many training voices and the absence of discriminant information among common vectors can be the source of critical errors. To solve above problems and improve recognition rate, k-clustering subspace method and DCVEM suggested. And did various experiments using voice signal database made by ETRI to prove the validity of suggested methods. The result of experiments shows improvements in performance. And with proposed methods, all the CVEM problems can be solved with out calculation problem.

Human Touching Behavior Recognition based on Neural Network in the Touch Detector using Force Sensors (힘 센서를 이용한 접촉감지부에서 신경망기반 인간의 접촉행동 인식)

  • Ryu, Joung-Woo;Park, Cheon-Shu;Sohn, Joo-Chan
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.10
    • /
    • pp.910-917
    • /
    • 2007
  • Of the possible interactions between human and robot, touch is an important means of providing human beings with emotional relief. However, most previous studies have focused on interactions based on voice and images. In this paper. a method of recognizing human touching behaviors is proposed for developing a robot that can naturally interact with humans through touch. In this method, the recognition process is divided into pre-process and recognition Phases. In the Pre-Process Phase, recognizable characteristics are calculated from the data generated by the touch detector which was fabricated using force sensors. The force sensor used an FSR (force sensing register). The recognition phase classifies human touching behaviors using a multi-layer perceptron which is a neural network model. Experimental data was generated by six men employing three types of human touching behaviors including 'hitting', 'stroking' and 'tickling'. As the experimental result of a recognizer being generated for each user and being evaluated as cross-validation, the average recognition rate was 82.9% while the result of a single recognizer for all users showed a 74.5% average recognition rate.

Speech Recognition for the Korean Vowel 'ㅣ' based on Waveform-feature Extraction and Neural-network Learning (파형 특징 추출과 신경망 학습 기반 모음 'ㅣ' 음성 인식)

  • Rho, Wonbin;Lee, Jongwoo;Lee, Jaewon
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.69-76
    • /
    • 2016
  • With the recent increase of the interest in IoT in almost all areas of industry, computing technologies have been increasingly applied in human environments such as houses, buildings, cars, and streets; in these IoT environments, speech recognition is being widely accepted as a means of HCI. The existing server-based speech recognition techniques are typically fast and show quite high recognition rates; however, an internet connection is necessary, and complicated server computing is required because a voice is recognized by units of words that are stored in server databases. This paper, as a successive research results of speech recognition algorithms for the Korean phonemic vowel 'ㅏ', 'ㅓ', suggests an implementation of speech recognition algorithms for the Korean phonemic vowel 'ㅣ'. We observed that almost all of the vocal waveform patterns for 'ㅣ' are unique and different when compared with the patterns of the 'ㅏ' and 'ㅓ' waveforms. In this paper we propose specific waveform patterns for the Korean vowel 'ㅣ' and the corresponding recognition algorithms. We also presents experiment results showing that, by adding neural-network learning to our algorithm, the voice recognition success rate for the vowel 'ㅣ' can be increased. As a result we observed that 90% or more of the vocal expressions of the vowel 'ㅣ' can be successfully recognized when our algorithms are used.

Improving transformer-based speech recognition performance using data augmentation by local frame rate changes (로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상)

  • Lim, Seong Su;Kang, Byung Ok;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.122-129
    • /
    • 2022
  • In this paper, we propose a method to improve the performance of Transformer-based speech recognizers using data augmentation that locally adjusts the frame rate. First, the start time and length of the part to be augmented in the original voice data are randomly selected. Then, the frame rate of the selected part is changed to a new frame rate by using linear interpolation. Experimental results using the Wall Street Journal and LibriSpeech speech databases showed that the convergence time took longer than the baseline, but the recognition accuracy was improved in most cases. In order to further improve the performance, various parameters such as the length and the speed of the selected parts were optimized. The proposed method was shown to achieve relative performance improvement of 11.8 % and 14.9 % compared with the baseline in the Wall Street Journal and LibriSpeech speech databases, respectively.

Design and Implementation of Convenience System Based on IoT (IoT를 기반한 편의 시스템 설계 및 구현)

  • Ui-Do Kim;Seung-Jin Yu;Jae-Won Lee;Seok-Tae Cho;Jae-Wook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.165-172
    • /
    • 2024
  • In this paper, we designed a smart home system that can be used intuitively and easily in everyday life, such as sending text messages to users, providing various information and scheduling using smart AI, and providing lighting and atmosphere suitable for the atmosphere in situations such as listening to music using neopixels, as well as using ESP32, RFID, and Google Cloude Console using raspberry pie. As a result of the experiment, it was confirmed that security characters were normally sent to users when RFID was recognized on ESP32 connected to Wi-Fi even if the power was reconnected, and smart AI using Neopixel lighting, Raspberry Pie, and voice recognition, which calculated frequency, also changed the recognition rate over distance, but it worked.

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

  • Kwon, Oh-Wook;Lee, Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3E
    • /
    • pp.124-132
    • /
    • 2003
  • We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.

Automated Call Routing Call Center System Based on Speech Recognition (음성인식을 이용한 고객센터 자동 호 분류 시스템)

  • Shim, Yu-Jin;Kim, Jae-In;Koo, Myung-Wan
    • Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.183-191
    • /
    • 2005
  • This paper describes the automated call routing for call center system based on speech recognition. We focus on the task of automatically routing telephone calls based on a users fluently spoken response instead of touch tone menus in an interactive voice response system. Vector based call routing algorithm is investigated and normalization method suggested. Call center database which was collected by KT is used for call routing experiment. Experimental results evaluating call-classification from transcribed speech are reported for that database. In case of small training data, an average call routing error reduction rate of 9% is observed when normalization method is used.

  • PDF