Search | Korea Science

Performance Comparison of the Speech Enhancement Methods for Noisy Speech Recognition (잡음음성인식을 위한 음성개선 방식들의 성능 비교)

Chung, Yong-Joo
- Phonetics and Speech Sciences
- /
- v.1 no.2
- /
- pp.9-14
- /
- 2009
Speech enhancement methods can be generally classified into a few categories and they have been usually compared with each other in terms of speech quality. For the successful use of speech enhancement methods in speech recognition systems, performance comparisons in terms of speech recognition accuracy are necessary. In this paper, we compared the speech recognition performance of some of the representative speech enhancement algorithms which are popularly cited in the literature and used widely. We also compared the performance of speech enhancement methods with other noise robust speech recognition methods like PMC to verify the usefulness of speech enhancement approaches in noise robust speech recognition systems.
PDF

Applying Mobile Agent for Internet-based Distributed Speech Recognition

Saaim, Emrul Hamide Md;Alias, Mohamad Ashari;Ahmad, Abdul Manan;Ahmad, Jamal Nasir
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.134-138
- /
- 2005
There are several application have been developed on internet-based speech recognition. Internet-based speech recognition is a distributed application and there were various techniques and methods have been using for that purposed. Currently, client-server paradigm was one of the popular technique that been using for client-server communication in web application. However, there is a new paradigm with the same purpose: mobile agent technology. Mobile agent technology has several advantages working on distributed internet-based system. This paper presents, applying mobile agent technology in internet-based speech recognition which based on client-server processing architecture.
PDF

Review And Challenges In Speech Recognition (ICCAS 2005)

Ahmed, M.Masroor;Ahmed, Abdul Manan Bin
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.1705-1709
- /
- 2005
This paper covers review and challenges in the area of speech recognition by taking into account different classes of recognition mode. The recognition mode can be either speaker independent or speaker dependant. Size of the vocabulary and the input mode are two crucial factors for a speech recognizer. The input mode refers to continuous or isolated speech recognition system and the vocabulary size can be small less than hundred words or large less than few thousands words. This varies according to system design and objectives.[2]. The organization of the paper is: first it covers various fundamental methods of speech recognition, then it takes into account various deficiencies in the existing systems and finally it discloses the various probable application areas.
PDF

Constructing a Noise-Robust Speech Recognition System using Acoustic and Visual Information (청각 및 시가 정보를 이용한 강인한 음성 인식 시스템의 구현)

Lee, Jong-Seok;Park, Cheol-Hoon
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.8
- /
- pp.719-725
- /
- 2007
In this paper, we present an audio-visual speech recognition system for noise-robust human-computer interaction. Unlike usual speech recognition systems, our system utilizes the visual signal containing speakers' lip movements along with the acoustic signal to obtain robust speech recognition performance against environmental noise. The procedures of acoustic speech processing, visual speech processing, and audio-visual integration are described in detail. Experimental results demonstrate the constructed system significantly enhances the recognition performance in noisy circumstances compared to acoustic-only recognition by using the complementary nature of the two signals.
https://doi.org/10.5302/J.ICROS.2007.13.8.719 인용 PDF KSCI

Selecting Good Speech Features for Recognition

Lee, Young-Jik;Hwang, Kyu-Woong
- ETRI Journal
- /
- v.18 no.1
- /
- pp.29-41
- /
- 1996
This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.
PDF

AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation

Byung Ok Kang;Hyung-Bae Jeon;Yun Kyung Lee
- ETRI Journal
- /
- v.46 no.1
- /
- pp.48-58
- /
- 2024
This paper presents the development of language tutoring systems for nonnative speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI Peng-Talk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.
https://doi.org/10.4218/etrij.2023-0322 인용 PDF

Emotion Recognition using Robust Speech Recognition System (강인한 음성 인식 시스템을 사용한 감정 인식)

Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.18 no.5
- /
- pp.586-591
- /
- 2008
This paper studied the emotion recognition system combined with robust speech recognition system in order to improve the performance of emotion recognition system. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. Final emotion recognition is processed using the input utterance and its emotional model according to the result of speech recognition. In the experiment, robust speech recognition system is HMM based speaker independent word recognizer using RASTA mel-cepstral coefficient and its derivatives and cepstral mean subtraction(CMS) as a signal bias removal. Experimental results showed that emotion recognizer combined with speech recognition system showed better performance than emotion recognizer alone.
https://doi.org/10.5391/JKIIS.2008.18.5.586 인용 PDF KSCI

An Experimental Study on Barging-In Effects for Speech Recognition Using Three Telephone Interface Boards

Park, Sung-Joon;Kim, Ho-Kyoung;Koo, Myoung-Wan
- Speech Sciences
- /
- v.8 no.1
- /
- pp.159-165
- /
- 2001
In this paper, we make an experiment on speech recognition systems with barging-in and non-barging-in utterances. Barging-in capability, with which we can say voice commands while voice announcement is coming out, is one of the important elements for practical speech recognition systems. Barging-in capability can be realized by echo cancellation techniques based on the LMS (least-mean-square) algorithm. We use three kinds of telephone interface boards with barging-in capability, which are respectively made by Dialogic Company, Natural MicroSystems Company and Korea Telecom. Speech database was made using these three kinds of boards. We make a comparative recognition experiment with this speech database.
PDF

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

Xi, Su Mei;Cho, Young Im
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.13 no.3
- /
- pp.164-170
- /
- 2013
Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment. This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition. A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characteristics of the audio and video streams. A weight coefficient is introduced to adjust the weight of the video and audio streams automatically according to differences in the noise environment. Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance.
https://doi.org/10.5391/IJFIS.2013.13.3.164 인용 PDF KSCI

Speech Parameters for the Robust Emotional Speech Recognition (감정에 강인한 음성 인식을 위한 음성 파라메터)

Kim, Weon-Goo
- Journal of Institute of Control, Robotics and Systems
- /
- v.16 no.12
- /
- pp.1137-1142
- /
- 2010
This paper studied the speech parameters less affected by the human emotion for the development of the robust speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient and frequency warped mel-cepstral coefficient were used as feature parameters. And CMS (Cepstral Mean Subtraction) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using vocal tract length normalized mel-cepstral coefficient, its derivatives and CMS as a signal bias removal showed the best performance of 0.78% word error rate. This corresponds to about a 50% word error reduction as compare to the performance of baseline system using mel-cepstral coefficient, its derivatives and CMS.
https://doi.org/10.5302/J.ICROS.2010.16.12.1137 인용 PDF KSCI

Search Result 358, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)