Search | Korea Science

Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation (잡음음성 음향모델 적응에 기반한 잡음에 강인한 음성인식)

Chung, Yongjoo
- Phonetics and Speech Sciences
- /
- v.6 no.2
- /
- pp.29-34
- /
- 2014
In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.
https://doi.org/10.13064/KSSS.2014.6.2.029 인용 PDF KSCI

Comparison of Speech Intelligibility & Performance of Speech Recognition in Real Driving Environments (자동차 주행 환경에서의 음성 전달 명료도와 음성 인식 성능 비교)

Lee Kwang-Hyun;Choi Dae-Lim;Kim Young-Il;Kim Bong-Wan;Lee Yong-Ju
- MALSORI
- /
- no.50
- /
- pp.99-110
- /
- 2004
The normal transmission characteristics of sound are hardly obtained due to the various noises and structural factors in a running car environment. It is due to the channel distortion of the original source sound recorded by microphones, and it seriously degrades the performance of the speech recognition in real driving environments. In this paper we analyze the degree of intelligibility under the various sound distortion environments by channels according to driving speed with respect to speech transmission index(STI) and compare the STI with rates of speech recognition. We examine the correlation between measures of intelligibility depending on sound pick-up patterns and performance in speech recognition. Thereby we consider the optimal location of a microphone in single channel environment. In experimentation we find that high correlation is obtained between STI and rates of speech recognition.
PDF

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

Chung, Kyungyong;Oh, SangYeob
- Journal of Digital Convergence
- /
- v.16 no.11
- /
- pp.335-340
- /
- 2018
Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.
https://doi.org/10.14400/JDC.2018.16.11.335 인용 PDF KSCI HTML

An Efficient Model Parameter Compensation Method foe Robust Speech Recognition

Chung Yong-Joo
- MALSORI
- /
- no.45
- /
- pp.107-115
- /
- 2003
An efficient method that compensates the HMM parameters for the noisy speech recognition is proposed. Instead of assuming some analytical approximations as in the PMC, the proposed method directly re-estimates the HMM parameters by the segmental k-means algorithm. The proposed method has shown improved results compared with the conventional PMC method at reduced computational cost.
PDF

Machine Learning Techniques for Speech Recognition using the Magnitude

Krishnan, C. Gopala;Robinson, Y. Harold;Chilamkurti, Naveen
- Journal of Multimedia Information System
- /
- v.7 no.1
- /
- pp.33-40
- /
- 2020
Machine learning consists of supervised and unsupervised learning among which supervised learning is used for the speech recognition objectives. Supervised learning is the Data mining task of inferring a function from labeled training data. Speech recognition is the current trend that has gained focus over the decades. Most automation technologies use speech and speech recognition for various perspectives. This paper demonstrates an overview of major technological standpoint and gratitude of the elementary development of speech recognition and provides impression method has been developed in every stage of speech recognition using supervised learning. The project will use DNN to recognize speeches using magnitudes with large datasets.
https://doi.org/10.33851/JMIS.2020.7.1.33 인용 PDF KSCI HTML

A Study on Phoneme Recognition using Neural Networks and Fuzzy logic (신경망과 퍼지논리를 이용한 음소인식에 관한 연구)

Han, Jung-Hyun;Choi, Doo-Il
- Proceedings of the KIEE Conference
- /
- 1998.07g
- /
- pp.2265-2267
- /
- 1998
This paper deals with study of Fast Speaker Adaptation Type Speech Recognition, and to analyze speech signal efficiently in time domain and time-frequency domain, utilizes SCONN[1] with Speech Signal Process suffices for Fast Speaker Adaptation Type Speech Recognition, and examined Speech Recognition to investigate adaptation of system, which has speech data input after speaker dependent recognition test.
PDF

A Usability Evaluation Method for Speech Recognition Interfaces (음성인식용 인터페이스의 사용편의성 평가 방법론)

Han, Seong-Ho;Kim, Beom-Su
- Journal of the Ergonomics Society of Korea
- /
- v.18 no.3
- /
- pp.105-125
- /
- 1999
As speech is the human being's most natural communication medium, using it gives many advantages. Currently, most user interfaces of a computer are using a mouse/keyboard type but the interface using speech recognition is expected to replace them or at least be used as a tool for supporting it. Despite the advantages, the speech recognition interface is not that popular because of technical difficulties such as recognition accuracy and slow response time to name a few. Nevertheless, it is important to optimize the human-computer system performance by improving the usability. This paper presents a set of guidelines for designing speech recognition interfaces and provides a method for evaluating the usability. A total of 113 guidelines are suggested to improve the usability of speech-recognition interfaces. The evaluation method consists of four major procedures: user interface evaluation; function evaluation; vocabulary estimation; and recognition speed/accuracy evaluation. Each procedure is described along with proper techniques for efficient evaluation.
PDF

Multi-stage Speech Recognition Using Confidence Vector (신뢰도 벡터 기반의 다단계 음성인식)

Jeon, Hyung-Bae;Hwang, Kyu-Woong;Chung, Hoon;Kim, Seung-Hi;Park, Jun;Lee, Yun-Keun
- MALSORI
- /
- no.63
- /
- pp.113-124
- /
- 2007
In this paper, we propose a use of confidence vector as an intermediate input feature for multi-stage based speech recognition architecture to improve recognition accuracy. A multi-stage speech recognition structure is introduced as a method to reduce the computational complexity of the decoding procedure and then accomplish faster speech recognition. Conventional multi-stage speech recognition is usually composed of three stages, acoustic search, lexical search, and acoustic re-scoring. In this paper, we focus on improving the accuracy of the lexical decoding by introducing a confidence vector as an input feature instead of phoneme which was used typically. We take experimental results on 220K Korean Point-of-Interest (POI) domain and the experimental results show that the proposed method contributes on improving accuracy.
PDF

A Study on Noise-Robust Methods for Broadcast News Speech Recognition (방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰)

Chung Yong-joo
- MALSORI
- /
- no.50
- /
- pp.71-83
- /
- 2004
Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.
PDF

Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

Jin, Hye-won;Lee, A-Hyeon;Chae, Ye-Jin;Park, Su-Hyun;Kang, Yu-Jin;Lee, Soowon
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.10
- /
- pp.1-7
- /
- 2021
Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.
https://doi.org/10.9708/jksci.2021.26.10.001 인용 PDF KSCI HTML

Search Result 2,065, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)