• Title/Summary/Keyword: Voice recognition rate

Search Result 137, Processing Time 0.021 seconds

The Speech Recognition Using the Diffusion Network (확산망을 이용한 음성인식)

  • 허만택
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1996.10a
    • /
    • pp.70-75
    • /
    • 1996
  • In this paper, the pre-precessing method for the recognition of single vowels by use of spectrum envelope is presented , we use new method of an extrating spectrum envelope using the diffusion filter bank. We reduced the total processing time, and got higher enhancement of discrimination . By getting 88.3% of average recognition rate for single vowels of real voice through computer simulation, we confirmed it to be useful for speech recongition which use spectrum analysis for voice signal to have many frequency components.

  • PDF

The Voice Dialing System Using Dynamic Hidden Markov Models and Lexical Analysis (DHMM과 어휘해석을 이용한 Voice dialing 시스템)

  • 최성호;이강성;김순협
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.28B no.7
    • /
    • pp.548-556
    • /
    • 1991
  • In this paper, Korean spoken continuous digits are ercognized using DHMM(Dynamic Hidden Markov Model) and lexical analysis to provide the base of developing voice dialing system. After segmentation by phoneme unit, it is recognized. This system can be divided into the segmentation section, the design of standard speech section, the recognition section, and the lexical analysis section. In the segmentation section, it is segmented using the ZCR, O order LPC cepstrum, and Ai, parameter of voice speech dectaction, which is changed according to time. In the standard speech design section, 19 phonemes or syllables are trained by DHMM and designed as a standard speech. In the recognition section, phomeme stream are recognized by the Viterbi algorithm.In the lexical decoder section, finally recognized continuous digits are outputed. This experiment shiwed the recognition rate of 85.1% using data spoken 7 times of 21 classes of 7 continuous digits which are combinated all of the occurence, spoken by 10 man.

  • PDF

Noise Elimination Using Improved MFCC and Gaussian Noise Deviation Estimation

  • Sang-Yeob, Oh
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.1
    • /
    • pp.87-92
    • /
    • 2023
  • With the continuous development of the speech recognition system, the recognition rate for speech has developed rapidly, but it has a disadvantage in that it cannot accurately recognize the voice due to the noise generated by mixing various voices with the noise in the use environment. In order to increase the vocabulary recognition rate when processing speech with environmental noise, noise must be removed. Even in the existing HMM, CHMM, GMM, and DNN applied with AI models, unexpected noise occurs or quantization noise is basically added to the digital signal. When this happens, the source signal is altered or corrupted, which lowers the recognition rate. To solve this problem, each voice In order to efficiently extract the features of the speech signal for the frame, the MFCC was improved and processed. To remove the noise from the speech signal, the noise removal method using the Gaussian model applied noise deviation estimation was improved and applied. The performance evaluation of the proposed model was processed using a cross-correlation coefficient to evaluate the accuracy of speech. As a result of evaluating the recognition rate of the proposed method, it was confirmed that the difference in the average value of the correlation coefficient was improved by 0.53 dB.

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

  • Hong, Chunho;Cho, Youngho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.33-40
    • /
    • 2021
  • Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.

Open API-based Conversational Voice Interaction Scheme for Intelligent IoT Applications for the Digital Underprivileged (디지털 소외계층을 위한 지능형 IoT 애플리케이션의 공개 API 기반 대화형 음성 상호작용 기법)

  • Joonhyouk, Jang
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.22-29
    • /
    • 2022
  • Voice interactions are particularly effective in applications targeting the digital underprivileged who are not proficient in the use of smart devices. However, applications based on open APIs are using voice signals only for short, fragmentary input and output due to the limitations of existing touchscreen-oriented UI and API provided. In this paper, we design a conversational voice interaction model for interactions between users and intelligent mobile/IoT applications and propose a keyword detection algorithm based on the edit distance. The proposed model and scheme were implemented in an Android environment, and the edit distance-based keyword detection algorithm showed a higher recognition rate than the existing algorithm for keywords that were incorrectly recognized through speech recognition.

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF

An ASIC implementation of a Dual Channel Acoustic Beamforming for MEMS microphone in 0.18㎛ CMOS technology (0.18㎛ CMOS 공정을 이용한 MEMS 마이크로폰용 이중 채널 음성 빔포밍 ASIC 설계)

  • Jang, Young-Jong;Lee, Jea-Hack;Kim, Dong-Sun;Hwang, Tae-ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.949-958
    • /
    • 2018
  • A voice recognition control system is a system for controlling a peripheral device by recognizing a voice. Recently, a voice recognition control system have been applied not only to smart devices but also to various environments ranging from IoT(: Internet of Things), robots, and vehicles. In such a voice recognition control system, the recognition rate is lowered due to the ambient noise in addition to the voice of the user. In this paper, we propose a dual channel acoustic beamforming hardware architecture for MEMS(: Microelectromechanical Systems) microphones to eliminate ambient noise in addition to user's voice. And the proposed hardware architecture is designed as ASIC(: Application-Specific Integrated Circuit) using TowerJazz $0.18{\mu}m$ CMOS(: Complementary Metal-Oxide Semiconductor) technology. The designed dual channel acoustic beamforming ASIC has a die size of $48mm^2$, and the directivity index of the user's voice were measured to be 4.233㏈.

Voice Dialing System using Speaker Dependent Recognition for Korean Digit Speech (화자 종속 한국어 숫자음 음성 인식 다이얼링 시스템)

  • Park, Kee-Young;Shin, You-Shik;Kim, Chong-Kyo
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.2
    • /
    • pp.56-62
    • /
    • 1999
  • This paper described a voice dialing system(VDS) and its hardware implementation for a speaker-dependent recognition of Korean digit speech using duty cycle. The proposed VDS consist of integrator, leveling divider circuit and recognition program. The analog speech signal is applied to the VDS through the low-pass filter cutoff frequency is 4.5(kHz). It is thoroughly confirmed that the speaker-dependent recognition of Korean digit speech is well behaved by the hardware system. Experimental results show that the recognition rate is 64% in average for Korean digit speech. Moreover, a high recognition rate of 100% is obtained for digits; /4/, /5/, /6/, /7/, /9/, /0/.

  • PDF

Semantic-Oriented Error Correction for Voice-Activated Information Retrieval System

  • Yoon, Yong-Wook;Kim, Byeong-Chang;Lee, Gary-Geunbae
    • MALSORI
    • /
    • no.44
    • /
    • pp.115-130
    • /
    • 2002
  • Voice input is often required in many new application environments, but the low rate of speech recognition makes it difficult to extend its application. Previous approaches were to raise the accuracy of the recognition by post-processing of the recognition results, which were all lexical-oriented. We suggest a new semantic-oriented approach in speech recognition error correction. Through experiments using a speech-driven in-vehicle telematics information application, we show the excellent performance of our approach and some advantages it has as a semantic-oriented approach over a pure lexical-oriented approach.

  • PDF

Noise Robust Emotion Recognition Feature : Frequency Range of Meaningful Signal (음성의 특정 주파수 범위를 이용한 잡음환경에서의 감정인식)

  • Kim Eun-Ho;Hyun Kyung-Hak;Kwak Yoon-Keun
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.23 no.5 s.182
    • /
    • pp.68-76
    • /
    • 2006
  • The ability to recognize human emotion is one of the hallmarks of human-robot interaction. Hence this paper describes the realization of emotion recognition. For emotion recognition from voice, we propose a new feature called frequency range of meaningful signal. With this feature, we reached average recognition rate of 76% in speaker-dependent. From the experimental results, we confirm the usefulness of the proposed feature. We also define the noise environment and conduct the noise-environment test. In contrast to other features, the proposed feature is robust in a noise-environment.