• Title/Summary/Keyword: 음성인식률

Search Result 549, Processing Time 0.023 seconds

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people (언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법)

  • SeungKwon Lee;U-Jin Choe;Gwangil Jeon
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.17-24
    • /
    • 2022
  • With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%.

Context Recognition Using Environmental Sound for Client Monitoring System (피보호자 모니터링 시스템을 위한 환경음 기반 상황 인식)

  • Ji, Seung-Eun;Jo, Jun-Yeong;Lee, Chung-Keun;Oh, Siwon;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.2
    • /
    • pp.343-350
    • /
    • 2015
  • This paper presents a context recognition method using environmental sound signals, which is applied to a mobile-based client monitoring system. Seven acoustic contexts are defined and the corresponding environmental sound signals are obtained for the experiments. To evaluate the performance of the context recognition, MFCC and LPCC method are employed as feature extraction, and statistical pattern recognition method are used employing GMM and HMM as acoustic models, The experimental results show that LPCC and HMM are more effective at improving context recognition accuracy compared to MFCC and GMM respectively. The recognition system using LPCC and HMM obtains 96.03% in recognition accuracy. These results demonstrate that LPCC is effective to represent environmental sounds which contain more various frequency components compared to human speech. They also prove that HMM is more effective to model the time-varying environmental sounds compared to GMM.

A Reliable Data Capture in Multi-Reader RFID Environments (다중 태그 인식 기반의 신뢰성 있는 데이터 수집 환경)

  • Lee, Young-Ran
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.9
    • /
    • pp.4133-4137
    • /
    • 2011
  • Reliable Multi-Reader RFID identification is one of issues in Multi-Reader RFID realization program in recent. And the Multi-Reader RFID reader has difficulty to obtain reliable data in data capture layer. The reason is that unreliable readings such as a false positive reading and a false negative reading and missed readings can happen by reader collision problems, noise, or the mobility of tagged objects. We introduce performance metrics to solve these reader problems. We propose three solutions the Minimum Overlapped Read Zone (MORZ) with Received Signal Strength Indicator (RSSI), the Spatial-Temporal Division Access (STDA) method, and double bigger size of tags attached on the object. To show the improvement of the proposed methods, we calculate tag's successful read rates in a smart office, which consists of Multi-Reader RFID systems.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

Emotion Recognition and Expression System of User using Multi-Modal Sensor Fusion Algorithm (다중 센서 융합 알고리즘을 이용한 사용자의 감정 인식 및 표현 시스템)

  • Yeom, Hong-Gi;Joo, Jong-Tae;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.20-26
    • /
    • 2008
  • As they have more and more intelligence robots or computers these days, so the interaction between intelligence robot(computer) - human is getting more and more important also the emotion recognition and expression are indispensable for interaction between intelligence robot(computer) - human. In this paper, firstly we extract emotional features at speech signal and facial image. Secondly we apply both BL(Bayesian Learning) and PCA(Principal Component Analysis), lastly we classify five emotions patterns(normal, happy, anger, surprise and sad) also, we experiment with decision fusion and feature fusion to enhance emotion recognition rate. The decision fusion method experiment on emotion recognition that result values of each recognition system apply Fuzzy membership function and the feature fusion method selects superior features through SFS(Sequential Forward Selection) method and superior features are applied to Neural Networks based on MLP(Multi Layer Perceptron) for classifying five emotions patterns. and recognized result apply to 2D facial shape for express emotion.

Voice Interactions with A. I. Agent : Analysis of Domestic and Overseas IT Companies (A.I.에이전트와의 보이스 인터랙션 : 국내외 IT회사 사례연구)

  • Lee, Seo-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • Many countries and companies are pursuing and developing Artificial intelligence as it is the core technology of the 4th industrial revolution. Global IT companies such as Apple, Microsoft, Amazon, Google and Samsung have all released their own AI assistant hardware products, hoping to increase customer loyalty and capture market share. Competition within the industry for AI agent is intense. AI assistant products that command the biggest market shares and customer loyalty have a higher chance of becoming the industry standard. This study analyzed the current status of major overseas and domestic IT companies in the field of artificial intelligence, and suggested future strategic directions for voice UI technology development and user satisfaction. In terms of B2B technology, it is recommended that IT companies use cloud computing to store big data, innovative artificial intelligence technologies and natural language technologies. Offering voice recognition technologies on the cloud enables smaller companies to take advantage of such technologies at considerably less expense. Companies also consider using GPT-3(Generative Pre-trained Transformer 3) an open source artificial intelligence language processing software that can generate very natural human-like interactions and high levels of user satisfaction. There is a need to increase usefulness and usability to enhance user satisfaction. This study has practical and theoretical implications for industry and academia.

Automatic Music Transcription System Using SIDE (SIDE를 이용한 자동 음악 채보 시스템)

  • Hyoung, A-Young;Lee, Joon-Whoan
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.141-150
    • /
    • 2009
  • This paper proposes a system that can automatically write singing voices to music notes. First, the system uses Stabilized Diffusion Equation(SIDE) to divide the song to a series of syllabic parts based on pitch detection. By the song segmentation, our method can recognize the sound length of each fragment through clustering based on genetic algorithm. Moreover, this study introduces a concept called 'Relative Interval' so as to recognize interval based on pitch of singer. And it also adopted measure extraction algorithm using pause data to implement the higher precision of song transcription. By the experiments using 16 nursery songs, it is shown that the measure recognition rate is 91.5% and DMOS score reaches 3.82. These findings demonstrate effectiveness of system performance.

Study on Development for Smart Door Lock and App. using Arduino and Infrared Sensor (아두이노와 적외선 센서를 이용한 스마트 도어락과 앱 개발에 대한 연구)

  • Hyeomg-Jun, Jeon;Yoon-Soo, Na;Yeo-Gyun, Youn;Kyeong-Ho, Kim;Hee-Woon, Ahn;Jae-Wook, Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1199-1206
    • /
    • 2022
  • In this paper, unlike door locks that are controlled only by the existing keypad because electronic devices can be easily operated through apps on smartphones in modern society, an app was created using app inventory so that door locks can be controlled using smartphones. Through the Bluetooth module experiment, the communication distance with the smartphone was controlled up to 10m when there were no obstacles, and through the voice recognition experiment, the recognition rate was 85% and 90% at 500~1000Hz and 1000~1500Hz, respectively, and 70% and 80% at 80dB noise. Through the results of the experimental evaluation, it was confirmed that convenience and security could be improved.

A Study on the Frequency Scaling Methods Using LSP Parameters Distribution Characteristics (LSP 파라미터 분포특성을 이용한 주파수대역 조절법에 관한 연구)

  • 민소연;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.304-309
    • /
    • 2002
  • We propose the computation reduction method of real root method that is mainly used in the CELP (Code Excited Linear Prediction) vocoder. The real root method is that if polynomial equations have the real roots, we are able to find those and transform them into LSP. However, this method takes much time to compute, because the root searching is processed sequentially in frequency region. In this paper, to reduce the computation time of real root, we compare the real root method with two methods. In first method, we use the mal scale of searching frequency region that is linear below 1 kHz and logarithmic above. In second method, The searching frequency region and searching interval are ordered by each coefficient's distribution. In order to compare real root method with proposed methods, we measured the following two. First, we compared the position of transformed LSP (Line Spectrum Pairs) parameters in the proposed methods with these of real root method. Second, we measured how long computation time is reduced. The experimental results of both methods that the searching time was reduced by about 47% in average without the change of LSP parameters.

Noisy Environmental Adaptation for Word Recognition System Using Maximum a Posteriori Estimation (최대사후확률 추정법을 이용한 단어인식기의 잡음환경적응화)

  • Lee, Jung-Hoon;Lee, Shi-Wook;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.107-113
    • /
    • 1997
  • To achive a robust Korean word recognition system for both channel distortion and additive noise, maximum a posteriori estimation(MAP) adaptation is proposed and the effectiveness of environmental adaptation for improving recognition performance is investigated in this paper. To do this, recognition experiments using MAP adaptation are carried out for the three different speech ; 1) channel distortion is introduced, 2) environmental noise is added, 3) both channel distortion and additive noise are presented. Theeffectiveness of additive feature parameters, such as regressive coefficients and durations, for environmental adaptation are also investigated. From the speaker independent 100 words recognition tests, we had 9.0% of recognition improvement for the case 1), more than 75% for the case 2), and 11%~61.4% for the case 3) respectively, resulting that a MAP environmental adaptation is effective for both channel distorted and noise added speech recognition. But it turned out that duration information used as additive feature parameter did not played an important role in the tests.

  • PDF