Search | Korea Science

Computerized Sound Dictionary of Korean and English

Kim, Jong-Mi
- Speech Sciences
- /
- v.8 no.1
- /
- pp.33-52
- /
- 2001
A bilingual sound dictionary in Korean and English has been created for a broad range of sound reference to cross-linguistic, dialectal, native language (L1)-transferred biological and allophonic variations. The paper demonstrates that the pronunciation dictionary of the lexicon is inadequate for sound reference due to the preponderance of unmarked sounds. The audio registry consists of the three-way comparison of 1) English speech from native English speakers, 2) Korean speech from Korean speakers, and 3) English speech from Korean speakers. Several sub-dictionaries have been created as the foundation research for independent development. They are 1) a pronunciation dictionary of the Korean lexicon in a keyboard-compatible phonetic transcription, 2) a sound dictionary of L1-interfered language, and 3) an audible dictionary of Korean sounds. The dictionary was designed to facilitate the exchange of the speech signal and its corresponding text data on various media particularly on CD-ROM. The methodology and findings of the construction are discussed.
PDF

Meta-data Standardization of Speech Database (음성 DB의 메타데이타 표준화)

Kim Sanghun
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.61-64
- /
- 2003
In this paper, we introduce a new description method of annotation information of speech database. As one of structured description methods, XML based description which has been standardized by W3C will be applied to represent metadata of speech database. It will be continuously revised through the speech technology standard forum during this year
PDF

An Efficient Model Parameter Compensation Method foe Robust Speech Recognition

Chung Yong-Joo
- MALSORI
- /
- no.45
- /
- pp.107-115
- /
- 2003
An efficient method that compensates the HMM parameters for the noisy speech recognition is proposed. Instead of assuming some analytical approximations as in the PMC, the proposed method directly re-estimates the HMM parameters by the segmental k-means algorithm. The proposed method has shown improved results compared with the conventional PMC method at reduced computational cost.
PDF

Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

Jin, Hye-won;Lee, A-Hyeon;Chae, Ye-Jin;Park, Su-Hyun;Kang, Yu-Jin;Lee, Soowon
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.10
- /
- pp.1-7
- /
- 2021
Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.
https://doi.org/10.9708/jksci.2021.26.10.001 인용 PDF KSCI HTML

A Study on Noise-Robust Methods for Broadcast News Speech Recognition (방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰)

Chung Yong-joo
- MALSORI
- /
- no.50
- /
- pp.71-83
- /
- 2004
Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.
PDF

Multi-resolution DenseNet based acoustic models for reverberant speech recognition (잔향 환경 음성인식을 위한 다중 해상도 DenseNet 기반 음향 모델)

Park, Sunchan;Jeong, Yongwon;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.10 no.1
- /
- pp.33-38
- /
- 2018
Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.
https://doi.org/10.13064/KSSS.2018.10.1.033 인용 PDF KSCI

Classical Tamil Speech Enhancement with Modified Threshold Function using Wavelets

Indra., J;Kasthuri., N;Navaneetha Krishnan., S
- Journal of Electrical Engineering and Technology
- /
- v.11 no.6
- /
- pp.1793-1801
- /
- 2016
Speech enhancement is a challenging problem due to the diversity of noise sources and their effects in different applications. The goal of speech enhancement is to improve the quality and intelligibility of speech by reducing noise. Many research works in speech enhancement have been accomplished in English and other European Languages. There has been limited or no such works or efforts in the past in the context of Tamil speech enhancement in the literature. The aim of the proposed method is to reduce the background noise present in the Tamil speech signal by using wavelets. New modified thresholding function is introduced. The proposed method is evaluated on several speakers and under various noise conditions including White Gaussian noise, Babble noise and Car noise. The Signal to Noise Ratio (SNR), Mean Square Error (MSE) and Mean Opinion Score (MOS) results show that the proposed thresholding function improves the speech enhancement compared to the conventional hard and soft thresholding methods.
https://doi.org/10.5370/JEET.2016.11.6.1793 인용 PDF KSCI

Prominence Detection Using Feature Differences of Neighboring Syllables for English Speech Clinics (영어 강세 교정을 위한 주변 음 특징 차를 고려한 강조점 검출)

Shim, Sung-Geon;You, Ki-Sun;Sung, Won-Yong
- Phonetics and Speech Sciences
- /
- v.1 no.2
- /
- pp.15-22
- /
- 2009
Prominence of speech, which is often called 'accent,' affects the fluency of speaking American English greatly. In this paper, we present an accurate prominence detection method that can be utilized in computer-aided language learning (CALL) systems. We employed pitch movement, overall syllable energy, 300-2200 Hz band energy, syllable duration, and spectral and temporal correlation as features to model the prominence of speech. After the features for vowel syllables of speech were extracted, prominent syllables were classified by SVM (Support Vector Machine). To further improve accuracy, the differences in characteristics of neighboring syllables were added as additional features. We also applied a speech recognizer to extract more precise syllable boundaries. The performance of our prominence detector was measured based on the Intonational Variation in English (IViE) speech corpus. We obtained 84.9% accuracy which is about 10% higher than previous research.
PDF

Effects of Self-monitoring on Initiating Speech Behavior of the Hearing-impaired Preschoolers (자기점검법이 청각장애 유아의 자발적인 말시작 행동에 미치는 영향)

Hyun, No-Sang;Kim, Young-Tae
- Speech Sciences
- /
- v.9 no.3
- /
- pp.99-112
- /
- 2002
The purpose of the present study was to investigate the effectiveness of self-monitoring on spontaneously initiating speech behavior of the hearing-impaired preschoolers. Three hearing-impaired preschoolers were selected from a special school for the deaf. They showed some vocalizations and words under intensive instruction settings, but never spontaneously spoke as a means of communication. Multiple probe design was applied in this study. During the self-monitoring intervention, each child was trained to assess whether his own initiating speech behavior was occurred or not, and then record his own behavior's occurrence on the self-recording sheets and self-graphing sheets. The vibration of handphone was used as a tactile cue for self-monitoring. The results of the present study were as follows: (1) self-monitoring significantly increased the percentage of occurrence of spontaneously initiating speech behaviors. (2) Increased level of spontaneously initiating speech behavior were generalized into another natural instruction (cognitive) settings. (3) Increased level of spontaneously initiating speech behavior were maintained after four weeks from the termination of the intervention.
PDF

Effects of the Types of Noise and Signal-to-Noise Ratios on Speech Intelligibility in Dysarthria (소음 유형과 신호대잡음비가 마비말장애인의 말명료도에 미치는 영향)

Lee, Young-Mee;Sim, Hyun-Sub;Sung, Jee-Eun
- Phonetics and Speech Sciences
- /
- v.3 no.4
- /
- pp.117-124
- /
- 2011
This study investigated the effects of the types of noise and signal to noise ratios (SNRs) on speech intelligibility of an adult with dysartrhia. Speech intelligibility was judged by 48 naive listeners using a word transcription task. Repeated measures design was used with the types of noise (multi-talker babble/environmental noise) and SNRs (0, +10 dB, +20 dB) as within-subject factors. The dependent measure was the percentage of correctly transcribed words. Results revealed that two main effects were statistically significant. Listeners performed significantly worse in the multi-talker babble condition than the environmental noise condition, and they performed significantly better at higher levels of SNRs. The current results suggested that the multi-talker babble and lower level of SNRs decreased the speech intelligibility of adults with dysarthria, and speech-language pathologists should consider environmental factors such as the types of noise and SNRs in evaluating speech intelligibility of adults with dysarthria.
PDF

Search Result 5,286, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)