Search | Korea Science

Combining deep learning-based online beamforming with spectral subtraction for speech recognition in noisy environments (잡음 환경에서의 음성인식을 위한 온라인 빔포밍과 스펙트럼 감산의 결합)

Yoon, Sung-Wook;Kwon, Oh-Wook
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.439-451
- /
- 2021
We propose a deep learning-based beamformer combined with spectral subtraction for continuous speech recognition operating in noisy environments. Conventional beamforming systems were mostly evaluated by using pre-segmented audio signals which were typically generated by mixing speech and noise continuously on a computer. However, since speech utterances are sparsely uttered along the time axis in real environments, conventional beamforming systems degrade in case when noise-only signals without speech are input. To alleviate this drawback, we combine online beamforming algorithm and spectral subtraction. We construct a Continuous Speech Enhancement (CSE) evaluation set to evaluate the online beamforming algorithm in noisy environments. The evaluation set is built by mixing sparsely-occurring speech utterances of the CHiME3 evaluation set and continuously-played CHiME3 background noise and background music of MUSDB. Using a Kaldi-based toolkit and Google web speech recognizer as a speech recognition back-end, we confirm that the proposed online beamforming algorithm with spectral subtraction shows better performance than the baseline online algorithm.
https://doi.org/10.7776/ASK.2021.40.5.439 인용 PDF KSCI

The imitation patterns of adults and children on f0 intervals in North Kyungsang Korean

Kim, Jungsun
- Phonetics and Speech Sciences
- /
- v.11 no.2
- /
- pp.23-31
- /
- 2019
The present study examines whether pitch range variation in North Kyunsang Korean shows a categorical or continuous function. Specifically, the study is focused on the data imitated by adults and children in the North Kyungsang region. To investigate pitch range variation, the log-produced f0 intervals were measured and statistically analyzed. The results of the study are as follows. First, both the adults' and children's imitations were more categorical than continuous, especially for the HL-LH patterns. For the other pitch accent patterns, such as HH-HL and HH-LH, the curves were continuous or flat for most of the speakers. Second, the children's imitations were poorer than those of the adults. That is, the children's imitative responses were shown as more continuous or flat curves than categorical. For the children, the HL-LH pattern showed a categorical function at the midpoint of the curves, though the shifts were not as distinctive as the adults' data. This implies that the imitative responses of children follow the perceptual and productive trace of adults' speech behavior.
https://doi.org/10.13064/KSSS.2019.11.2.023 인용 PDF KSCI

A Syllabic Segmentation Method for the Korean Continuous Speech (우리말 연속음성의 음절 분할법)

한학용;고시영;허강인
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.70-75
- /
- 2001
This paper proposes a syllabic segmentation method for the korean continuous speech. This method are formed three major steps as follows. (1) labeling the vowel, consonants, silence units and forming the Token the sequence of speech data using the segmental parameter in the time domain, pitch, energy, ZCR and PVR. (2) scanning the Token in the structure of korean syllable using the parser designed by the finite state automata, and (3) re-segmenting the syllable parts witch have two or more syllables using the pseudo-syllable nucleus information. Experimental results for the capability evaluation toward the proposed method regarding to the continuous words and sentence units are 73.5％, 85.9％, respectively.
PDF

A Study on SNR Estimation of Continuous Speech Signal (연속음성신호의 SNR 추정기법에 관한 연구)

Song, Young-Hwan;Park, Hyung-Woo;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.4
- /
- pp.383-391
- /
- 2009
In speech signal processing, speech signal corrupted by noise should be enhanced to improve quality. Usually noise estimation methods need flexibility for variable environment. Noise profile is renewed on silence region to avoid effects of speech properties. So we have to preprocess finding voice region before noise estimation. However, if received signal does not have silence region, we cannot apply that method. In this paper, we proposed SNR estimation method for continuous speech signal. The waveform which is stationary region of voiced speech is very correlated by pitch period. So we can estimate the SNR by correlation of near waveform after dividing a frame for each pitch. For unvoiced speech signal, vocal track characteristic is reflected by noise, so we can estimate SNR by using spectral distance between spectrum of received signal and estimated vocal track. Lastly, energy of speech signal is mostly distributed on voiced region, so we can estimate SNR by the ratio of voiced region energy to unvoiced.
https://doi.org/10.7776/ASK.2009.28.4.383 인용 PDF KSCI

Research about auto-segmentation via SVM (SVM을 이용한 자동 음소분할에 관한 연구)

권호민;한학용;김창근;허강인
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2220-2223
- /
- 2003
In this paper we used Support Vector Machines(SVMs) recently proposed as the loaming method, one of Artificial Neural Network, to divide continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. From experiment we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).
PDF

The Effects of Syllable Boundary Ambiguity on Spoken Word Recognition in Korean Continuous Speech

Kang, Jinwon;Kim, Sunmi;Nam, Kichun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.6 no.11
- /
- pp.2800-2812
- /
- 2012
The purpose of this study was to examine the syllable-word boundary misalignment cost on word segmentation in Korean continuous speech. Previous studies have demonstrated the important role of syllabification in speech segmentation. The current study investigated whether the resyllabification process affects word recognition in Korean continuous speech. In Experiment I, under the misalignment condition, participants were presented with stimuli in which a word-final consonant became the onset of the next syllable. (e.g., /k/ in belsak ingan becomes the onset of the first syllable of ingan 'human'). In the alignment condition, they heard stimuli in which a word-final vowel was also the final segment of the syllable (e.g., /eo/ in heulmeo ingan is the end of both the syllable and word). The results showed that word recognition was faster and more accurate in the alignment condition. Experiment II aimed to confirm that the results of Experiment I were attributable to the resyllabification process, by comparing only the target words from each condition. The results of Experiment II supported the findings of Experiment I. Therefore, based on the current study, we confirmed that Korean, a syllable-timed language, has a misalignment cost of resyllabification.
https://doi.org/10.3837/tiis.2012.10.003 인용 PDF KSCI

On the Development of a Large-Vocabulary Continuous Speech Recognition System for the Korean Language (대용량 한국어 연속음성인식 시스템 개발)

Choi, In-Jeong;Kwon, Oh-Wook;Park, Jong-Ryeal;Park, Yong-Kyu;Kim, Do-Yeong;Jeong, Ho-Young;Un, Chong-Kwan
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.5
- /
- pp.44-50
- /
- 1995
This paper describes a large-vocabulary continuous speech recognition system using continuous hidden Markov models for the Korean language. To improve the performance of the system, we study on the selection of speech modeling units, inter-word modeling, search algorithm, and grammars. We used triphones as basic speech modeling units, generalized triphones and function word-dependent phones are used to improve the trainability of speech units and to reduce errors in function words. Silence between words is optionally inserted by using a silence model and a null transition. Word pair grammar and bigram model based oil word classes are used. Also we implement a search algorithm to find N-best candidate sentences. A postprocessor reorders the N-best sentences using word triple grammar, selects the most likely sentence as the final recognition result, and finally corrects trivial errors related with postpositions. In recognition tests using a 3,000-word continuous speech database, the system attained $93.1\%$ word recognition accuracy and $73.8\%$ sentence recognition accuracy using word triple grammar in postprocessing.
PDF

Speaker Verification System with Hybrid Model Improved by Adapted Continuous Wavelet Transform

Kim, Hyoungsoo;Yang, Sung-il;Younghun Kwon;Kyungjoon Cha
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3E
- /
- pp.30-36
- /
- 1999
In this paper, we develop a hybrid speaker recognition system [1] enhanced by pre-recognizer and post-recognizer. The pre-recognizer consists of general speech recognition systems and the post-recognizer is a pitch detection system using adapted continuous wavelet transform (ACWT) to improve the performance of the hybrid speaker recognition system. Two schemes to design ACWT is considered. One is the scheme to search basis library covering the whole band of speech fundamental frequency (speech pitch). The other is the scheme to determine which one is the best basis. Information cost functional is used for the criterion for the latter. ACWT is robust enough to classify the pitch of speech very well, even though the speech signal is badly damaged by environmental noises.
PDF

A Study on the Phonemic Analysis for Korean Speech Segmentation (한국어 음소분리에 관한 연구)

Lee, Sou-Kil;Song, Jeong-Young
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.4E
- /
- pp.134-139
- /
- 2004
It is generally known that accurate segmentation is very necessary for both an individual word and continuous utterances in speech recognition. It is also commonly known that techniques are now being developed to classify the voiced and the unvoiced, also classifying the plosives and the fricatives. The method for accurate recognition of the phonemes isn't yet scientifically established. Therefore, in this study we analyze the Korean language, using the classification of 'Hunminjeongeum' and contemporary phonetics, with the frequency band, Mel band and Mel Cepstrum, we extract notable features of the phonemes from Korean speech and segment speech by the unit of the phonemes to normalize them. Finally, through the analysis and verification, we intend to set up Phonemic Segmentation System that will make us able to adapt it to both an individual word and continuous utterances.
PDF KSCI

A Study on Realization of Continuous Speech Recognition System of Speaker Adaptation (화자적응화 연속음성 인식 시스템의 구현에 관한 연구)

김상범;김수훈;허강인;고시영
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3
- /
- pp.10-16
- /
- 1999
In this paper, we have studied Continuous Speech Recognition System of Speaker Adaptation using MAPE (Maximum A Posteriori Probability Estimation) which can adapt any small amount of adaptation speech data. Speaker adaptation is performed by the method of MAPB after Concatenation training which is making sentence unit HMM linked by syllable unit HMM and Viterbi segmentation classifies speech data to be adaptation into segmentation of syllable unit data automatically without hand labelling. For car control speech the recognition rates of adaptation of HMM was 77.18% which is approximately 6% improvement over that of unadapted HMM.(in case of O(n)DP)
PDF

Search Result 317, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)