Search | Korea Science

Speech Recognition Accuracy Measure using Deep Neural Network for Effective Evaluation of Speech Recognition Performance (효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표)

Ji, Seung-eun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.12
- /
- pp.2291-2297
- /
- 2017
This paper describe to extract speech measure algorithm for evaluating a speech database, and presents generating method of a speech quality measure using DNN(Deep Neural Network). In our previous study, to produce an effective speech quality measure, we propose a combination of various speech measures which are highly correlated with WER(Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we describe the method of extracting measure using DNN, and we change one of the combined measure from GMM(Gaussican Mixture Model) score used in the previous study to DNN score. The combination with DNN score shows a higher correlation with WER compared to the combination with GMM score.
https://doi.org/10.6109/jkiice.2017.21.12.2291 인용 PDF KSCI

A Study on Development of a Hearing Impairment Simulator considering Frequency Selectivity and Asymmetrical Auditory Filter of the Hearing Impaired (난청인의 주파수 선택도와 비대칭적 청각 필터를 고려한 난청 시뮬레이터 개발에 관한 연구)

Joo, Sang-Ick;Kang, Hyun-Deok;Song, Young-Rok;Lee, Sang-Min
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.59 no.4
- /
- pp.831-840
- /
- 2010
In this paper, we propose a hearing impairment simulator considering reduced frequency selectivity and asymmetrical auditory filter of the hearing impaired, and we verified the reduced frequency selectivity and asymmetrical auditory filter affected in speech perception through experiments. The reduced frequency selectivity has made embodied by spectral smearing using LPC(linear prediction coding). The shapes of auditory filter are asymmetrical different with each center frequency. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. The experiments confirmed subjective test and objective test. The subjective experiments are composed of 4 kinds of tests: pure tone test, SRT(speech reception threshold) test, and WRS(word recognition score) test without spectral smearing, and WRS test with spectral smearing. The experiment of the hearing impairment simulator was performed from 9 subjects who have normal ears. The amount of spectral smearing was controlled by LPC order. The asymmetrical auditory filter of proposed hearing impairment simulator was simulated and then some tests to estimate the filter's performance objectively were performed. The objective experiment as simulated auditory filter's performance evaluation method used PESQ(perceptual evaluation of speech quality) and LLR(log likelihood ratio) for speech through auditory filter. The processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference according to asymmetrical auditory filter in hearing impairment simulator.
https://doi.org/10.5370/KIEE.2010.59.4.831 인용 PDF KSCI

A Word List Construction and Measurement Method for Intelligibility Assessment of Synthesized Speech by Rule (규칙 합성음의 이해성 평가를 위한 단어표 구성 및 실험법)

김성한;홍진우;김순협
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.29B no.1
- /
- pp.43-49
- /
- 1992
As a result of recent progress in speech synthesis techniques, the those new services using new techniques are going to introduce into the telephone communication system. In setting standards, voice quality is obviously an important criterion. It is very important to develope a quality evaluation method of synthesized speech for the diagnostic assessment of system algorithm, and fair comparison of assessment values. This paper has described several basic concepts and criterions for quality assessment (intelligibility) of synthesized speech by rule, and then a word selection method and the word list to be used in word intelligibility test were proposed. Finally, a test method for word intelligibility is described.
PDF

An Improved Speech Absence Probability Estimation based on Environmental Noise Classification (환경잡음분류 기반의 향상된 음성부재확률 추정)

Son, Young-Ho;Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.7
- /
- pp.383-389
- /
- 2011
In this paper, we propose a improved speech absence probability estimation algorithm by applying environmental noise classification for speech enhancement. The previous speech absence probability required to seek a priori probability of speech absence was derived by applying microphone input signal and the noise signal based on the estimated value of a posteriori SNR threshold. In this paper, the proposed algorithm estimates the speech absence probability using noise classification algorithm which is based on Gaussian mixture model in order to apply the optimal parameter each noise types, unlike the conventional fixed threshold and smoothing parameter. Performance of the proposed enhancement algorithm is evaluated by ITU-T P.862 PESQ (perceptual evaluation of speech quality) and composite measure under various noise environments. It is verified that the proposed algorithm yields better results compared to the conventional speech absence probability estimation algorithm.
https://doi.org/10.7776/ASK.2011.30.7.383 인용 PDF KSCI

Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint (공동 행렬대각화 조건 기반 온라인 음원 신호 분리 및 잔향제거)

Yu, Ho-Gun;Kim, Do-Hui;Song, Min-Hwan;Park, Hyung-Min
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.503-514
- /
- 2021
Reverberation in speech signals tends to significantly degrade the performance of the Blind Source Separation (BSS) system. Especially in online systems, the performance degradation becomes severe. Methods based on joint diagonalizability constraints have been recently developed to tackle the problem. To improve the quality of separated speech, in this paper, we add the proposed de-reverberation method to the online BSS algorithm based on the constraints in reverberant environments. Through experiments on the WSJCAM0 corpus, the proposed method was compared with the existing online BSS algorithm. The performance evaluation by the Signal-to-Distortion Ratio and the Perceptual Evaluation of Speech Quality demonstrated that SDR improved from 1.23 dB to 3.76 dB and PESQ improved from 1.15 to 2.12 on average.
https://doi.org/10.7776/ASK.2021.40.5.503 인용 PDF KSCI

A VOWEL TRAJECTORY DISPLAY FOR SPEECH TRAINING

Kido, Ken'iti;Tanahashi, Kenji;Ohuchi, Yasuhiro
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.971-976
- /
- 1994
A speech display system is developed for the evaluation and the training of speech utterance. The speech is analyzed by linear predictive technique every 5 ms and the frequencies of the lowest two spectral local peaks P1 and P2 are extracted. The vowel trakectory is displayed using those frequencies on th P1-P2 plane. In most cases, P1 and P2 correspond to the first and the second formants, but in the case of indistinct utterance, the correspondence between the local spectral peaks and the formants tends to fall into disorder. And the system is considered to be useful for the evaluation of speech quality. The examples of some words uttered by normal speakers and some patients with difficulty in utterance are compared each other for the discussion of the effectiveness of the system.
PDF

An Optimization of Speech Database in Corpus-based speech synthesis sytstem (코퍼스기반 음성합성기의 데이터베이스 최적화 방안)

Jang Kyung-Ae;Chung Min-Hwa
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.209-213
- /
- 2002
This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB should reflect the frequency of units in Korean language. So, the target population of every unit is set to be proportional to their frequency in Korean large corpus(780K sentences, 45Mega phonemes). Second, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance should be reflected in clustering criterion and used as criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than using conventional methods.
PDF

A Reduction of Speech Database in Corpus-based Speech Synthesis System (코퍼스기반 음성합성기의 데이터베이스 감축방안)

Jang Kyung-Ae;Chung Min-Hwa;Kim Jae-In;Koo Myoung-Wan
- MALSORI
- /
- no.44
- /
- pp.145-156
- /
- 2002
This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of the Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB reflect the frequency of units in the Korean language. So, the target population of every unit is set to be proportional to its frequency in Korean large corpus (780k sentences, 45Mega phones). Secondly, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance be reflected in clustering criteria and used as another important criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than that using conventional methods.
PDF

A Method of Automated Quality Evaluation for Voice-Based Consultation (음성 기반 상담의 품질 평가를 위한 자동화 기법)

Lee, Keonsoo;Kim, Jung-Yeon
- Journal of Internet Computing and Services
- /
- v.22 no.2
- /
- pp.69-75
- /
- 2021
In a contact-free society, online services are becoming more important than classic offline services. At the same time, the role of a contact center, which executes customer relation management (CRM), is increasingly essential. For supporting the CRM tasks and their effectiveness, techniques of process automation need to be applied. Quality assurance (QA) is one of the time and resource consuming, and typical processes that are suitable for automation. In this paper, a method of automatic quality evaluation for voice based consultations is proposed. Firstly, the speech in consultations is transformed into a text by speech recognition. Then quantitative evaluation based on the QA metrics, including checking the elements in opening and closing mention, the existence of asking the mandatory information, the attitude of listening and speaking, is executed. 92.7% of the automated evaluations are the same to the result done by human experts. It was found that the non matching cases of the automated evaluations were mainly caused from the mistranslated Speech-to-Text (STT) result. With the confidence of STT result, this proposed method can be employed for enhancing the efficiency of QA process in contact centers.
https://doi.org/10.7472/jksii.2021.22.2.69 인용 PDF KSCI HTML

Comparison of Self-Reporting Voice Evaluations between Professional and Non-Professional Voice Users with Voice Disorders by Severity and Type (음성장애가 있는 직업적 음성사용자와 비직업적 음성사용자의 음성장애 중증도와 유형에 따른 자기보고식 음성평가 차이)

Kim, Jaeock
- Phonetics and Speech Sciences
- /
- v.7 no.4
- /
- pp.67-76
- /
- 2015
The purpose of this study was to compare professional (Pro) and non-professional (Non-pro) voice users with voice disorders in self-reporting voice evaluation using Korean-Voice Handicap Index (K-VHI) and Korean-Voice Related Quality of Life (K-VRQOL). In addition, those were compared by voice quality and voice disorder type. 94 Pro and 106 Non-pro were asked to fill out the K-VHI and K-VRQOL, perceptually evaluated on GRBAS scales, and divided into three types of voice disorders (functional, organic and neurologic) by an experienced speech-language pathologist and an otolaryngologist. The results showed that the functional (F) and physical (P) scores of K-VHI in Pro group were significantly higher than those in Non-pro group. As the voice quality evaluated by G scale got worse, the scores of all aspects except emotional (E) of K-VHI and social-emotional (SE) of K-VRQOL were higher. All scores of K-VHI and K-VRQOL in neurologic voice disorders were significantly higher than those in functional and organic voice disorders. In conclusion, professional voice users are more sensitive to their functional and physical handicap resulted by their voice problems and that goes double for the patients with severe and neurologic voice disorders.
https://doi.org/10.13064/KSSS.2015.7.4.067 인용 PDF KSCI

Search Result 178, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)