Search | Korea Science

Single-Channel Non-Causal Speech Enhancement to Suppress Reverberation and Background Noise

Song, Myung-Suk;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.8
- /
- pp.487-506
- /
- 2012
This paper proposes a speech enhancement algorithm to improve the speech intelligibility by suppressing both reverberation and background noise. The algorithm adopts a non-causal single-channel minimum variance distortionless response (MVDR) filter to exploit an additional information that is included in the noisy-reverberant signals in subsequent frames. The noisy-reverberant signals are decomposed into the parts of the desired signal and the interference that is not correlated to the desired signal. Then, the filter equation is derived based on the MVDR criterion to minimize the residual interference without bringing speech distortion. The estimation of the correlation parameter, which plays an important role to determine the overall performance of the system, is mathematically derived based on the general statistical reverberation model. Furthermore, the practical implementation methods to estimate sub-parameters required to estimate the correlation parameter are developed. The efficiency of the proposed enhancement algorithm is verified by performance evaluation. From the results, the proposed algorithm achieves significant performance improvement in all studied conditions and shows the superiority especially for the severely noisy and strongly reverberant environment.
https://doi.org/10.7776/ASK.2012.31.8.487 인용 PDF KSCI

Multimodal Emotion Recognition using Face Image and Speech (얼굴영상과 음성을 이용한 멀티모달 감정인식)

Lee, Hyeon Gu;Kim, Dong Ju
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.8 no.1
- /
- pp.29-40
- /
- 2012
A challenging research issue that has been one of growing importance to those working in human-computer interaction are to endow a machine with an emotional intelligence. Thus, emotion recognition technology plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between human and computer. In this paper, we propose the multimodal emotion recognition system using face and speech to improve recognition performance. The distance measurement of the face-based emotion recognition is calculated by 2D-PCA of MCS-LBP image and nearest neighbor classifier, and also the likelihood measurement is obtained by Gaussian mixture model algorithm based on pitch and mel-frequency cepstral coefficient features in speech-based emotion recognition. The individual matching scores obtained from face and speech are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. Through experimental results, the proposed method exhibits improved recognition accuracy of about 11.25% to 19.75% when compared to the most uni-modal approach. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.
KSCI

Speaker Adaptation Using Linear Transformation Network in Speech Recognition (선형 변환망을 이용한 화자적응 음성인식)

이기희
- Journal of the Korea Society of Computer and Information
- /
- v.5 no.2
- /
- pp.90-97
- /
- 2000
This paper describes an speaker-adaptive speech recognition system which make a reliable recognition of speech signal for new speakers. In the Proposed method, an speech spectrum of new speaker is adapted to the reference speech spectrum by using Parameters of a 1st linear transformation network at the front of phoneme classification neural network. And the recognition system is based on semicontinuous HMM(hidden markov model) which use the multilayer perceptron as a fuzzy vector quantizer. The experiments on the isolated word recognition are performed to show the recognition rate of the recognition system. In the case of speaker adaptation recognition, the recognition rate show significant improvement for the unadapted recognition system.
PDF

An Improvement of Speech Hearing Ability for sensorineural impaired listners (감음성(感音性) 난청인의 언어청력 향상에 관한 연구)

Lee, S.M.;Woo, H.C.;Kim, D.W.;Song, C.G.;Lee, Y.M.;Kim, W.K.
- Proceedings of the KOSOMBE Conference
- /
- v.1996 no.05
- /
- pp.240-242
- /
- 1996
In this paper, we proposed a method of a hearing aid suitable for the sensorineural hearing impaired. Generally as the sensorineural hearing impaired have narrow audible ranges between threshold and discomfortable level, the speech spectrum may easily go beyond their audible range. Therefore speech spectrum must be optimally amplified and compressed into the impaired's audible range. The level and frequency of input speech signal are varied continuously. So we have to make compensation input signal for frequency-gain loss of the impaired, specially in the frequency band which includes much information. The input sigaal is divided into short time block and spectrum within the block is calculated. The frequency-gain characteristic is determined using the calculated spectrum. The number of frequency band and the target gain which will be added input signal are estimated. The input signal within the block is processed by a single digital filter with the calculated frequency-gain characteristics. From the results of monosyllabic speech tests to evaluate the performance of the proposed algorithm, the scores of test were improved.
PDF

Coding Method of Variable Threshold Dual Rate ADPCM Speech Considering the Background Noise (배경 잡음환경에서 가변 임계값에 의한 Dual Rate ADPCM 음성 부호화 기법)

한경호
- Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
- /
- v.17 no.6
- /
- pp.154-159
- /
- 2003
In this paper, we proposed variable threshold dual rate ADPCM coding method which adapts two coding rates of the standard ADPCM of ITU G.726 for speech quality improvement at a comparably low coding rates. The ZCR(Zero Crossing Rate) is computed for speecd data and under the noisy environment, noise data dominant region showed higher ZCR and speech data dominant region showed lower ZCR. The speech data with the higher ZCR is encoded by low coding rate for reduced coded data and the speech data with the lower ZCR is encoded by high coding rate for speech quality improvements. For coded data, 2 bits are assigned for low coding rate of 16[Kbps] and 5 bits are is assigned for high coding rate of 40[Kbps]. Through the simulation, the proposed idea is evaluated and shown that the variable dual rate ADPCM coding technique shows the qood speech quality at low coding rate.
https://doi.org/10.5207/JIEIE.2003.17.6.154 인용 PDF KSCI

A Study of Nasalance Change in Submucosal Type Cleft Palate Patients by Surgery (점막하 구개열 환자의 수술 전후 비음도 변화에 대한 연구)

Choi, Ju-Seok;Leem, Dae-Ho;Baek, Jin-A;Kim, Oh-Hwan;Kim, Hyun-Ki;Shin, Hyo-Keun
- Korean Journal of Cleft Lip And Palate
- /
- v.8 no.2
- /
- pp.53-62
- /
- 2005
Submucosal type cleft palate is a kind of cleft palate. A submucosal cleft may result in shortening of the anteroposterior dimension of the hard or soft palates or both. The increased distance along with the lack of muscle connection in the soft palate usually accounts for the lack of palatopharyngeal function in patients with submucosal cleft. Resonance disorders which is found in cleft patients show hypernasality or hyponasality. Many cases of submucosal type cleft palate patients visit our clinics due to hypernasality. In this study, resonance disorders was evaluated through nasalance test. Experimental group was composed of submucosal type cleft palate patients. The patients were treated by a so-called combined therapy, i.e., operation and speech training. To observe the changing pattern by surgery, nasalance test was carried out one time before surgery and three times after surgery. Nasometer II was used as a examination. The questionaire was filled with single vowels & diphthongs. The mean nasalance score of the child was significantly lower than that of the adult at every vowel. An early age at operation (under 10 years) was that a better functional result was achieved with patients. The mean nasalance score of /i/ was highest and that of /a/ was the lowest. The result of corrective surgery in selected cases has achieved improvement in all cases. Hypernasality has been consistently diminished. he operation.
PDF

Analysis of AMR Compressed Bit Stream for Insertion of Voice Data in QR Code (QR 코드에 음성 데이터 삽입을 위한 AMR 압축 비트열 분석)

Oh, Eun-ju;Cho, Hyun-ji;Jung, Hyeon-ah;Bae, Joung-eun;Yoo, Hoon
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2018.10a
- /
- pp.490-492
- /
- 2018
This paper presents an analysis of the AMR speech data as a basic work to study the technique of inputting and transmitting AMR voice data which is widely used in the public cell phone. AMR consists of HEADER and Speech Data, and it is transmitted in bit format and has 8 bit-rate modes in total. HEADER contains mode information of Speech Data, and the length of Speech Data differs depending on the mode. We chose the best mode which is best to input into QR code and analyzed that mode. It is a goal to show a higher compression ratio for voice data by the analysis and experiments. This analysis shows improvement in that it can transmit voice data more effectively.
PDF

Performance comparison of various deep neural network architectures using Merlin toolkit for a Korean TTS system (Merlin 툴킷을 이용한 한국어 TTS 시스템의 심층 신경망 구조 성능 비교)

Hong, Junyoung;Kwon, Chulhong
- Phonetics and Speech Sciences
- /
- v.11 no.2
- /
- pp.57-64
- /
- 2019
In this paper, we construct a Korean text-to-speech system using the Merlin toolkit which is an open source system for speech synthesis. In the text-to-speech system, the HMM-based statistical parametric speech synthesis method is widely used, but it is known that the quality of synthesized speech is degraded due to limitations of the acoustic modeling scheme that includes context factors. In this paper, we propose an acoustic modeling architecture that uses deep neural network technique, which shows excellent performance in various fields. Fully connected deep feedforward neural network (DNN), recurrent neural network (RNN), gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional LSTM (BLSTM) are included in the architecture. Experimental results have shown that the performance is improved by including sequence modeling in the architecture, and the architecture with LSTM or BLSTM shows the best performance. It has been also found that inclusion of delta and delta-delta components in the acoustic feature parameters is advantageous for performance improvement.
https://doi.org/10.13064/KSSS.2019.11.2.057 인용 PDF KSCI

Multi-level Skip Connection for Nested U-Net-based Speech Enhancement (중첩 U-Net 기반 음성 향상을 위한 다중 레벨 Skip Connection)

Seorim, Hwang;Joon, Byun;Junyeong, Heo;Jaebin, Cha;Youngcheol, Park
- Journal of Broadcast Engineering
- /
- v.27 no.6
- /
- pp.840-847
- /
- 2022
In a deep neural network (DNN)-based speech enhancement, using global and local input speech information is closely related to model performance. Recently, a nested U-Net structure that utilizes global and local input data information using multi-scale has bee n proposed. This nested U-Net was also applied to speech enhancement and showed outstanding performance. However, a single skip connection used in nested U-Nets must be modified for the nested structure. In this paper, we propose a multi-level skip connection (MLS) to optimize the performance of the nested U-Net-based speech enhancement algorithm. As a result, the proposed MLS showed excellent performance improvement in various objective evaluation metrics compared to the standard skip connection, which means th at the MLS can optimize the performance of the nested U-Net-based speech enhancement algorithm. In addition, the final proposed m odel showed superior performance compared to other DNN-based speech enhancement models.
https://doi.org/10.5909/JBE.2022.27.6.840 인용 PDF KSCI KPUBS

Speech Outcomes after Delayed Hard Palate Closure and Synchronous Secondary Alveolar Bone Grafting in Patients with Cleft Lip, Alveolus and Palate

Mona Haj;S.N. Hakkesteegt;H.G. Poldermans;H.H.W. de Gier;S.L. Versnel;E.B. Wolvius
- Archives of Plastic Surgery
- /
- v.51 no.4
- /
- pp.378-385
- /
- 2024
Background The best timing of closure of the hard palate in individuals with cleft lip, alveolus, and palate (CLAP) to reach the optimal speech outcomes and maxillary growth is still a subject of debate. This study evaluates changes in compensatory articulatory patterns and resonance in patients with unilateral and bilateral CLAP who underwent simultaneous closure of the hard palate and secondary alveolar bone grafting (ABG). Methods A retrospective study of patients with nonsyndromic unilateral and bilateral CLAP who underwent delayed hard palate closure (DHPC) simultaneously with ABG at 9 to 12 years of age from 2013 to 2018. The articulatory patterns, nasality, degree of hypernasality, facial grimacing, and speech intelligibility were assessed pre- and postoperatively. Results Forty-eight patients were included. DHPC and ABG were performed at the mean age of 10.5 years. Postoperatively hypernasal speech was still present in 54% of patients; however, the degree of hypernasality decreased in 67% (p < 0.001). Grimacing decreased in 27% (p = 0.015). Articulation disorders remained present in 85% (p = 0.375). Intelligible speech (grade 1 or 2) was observed in 71 compared with 35% of patients preoperatively (p < 0.001). Conclusion This study showed an improved resonance and intelligibility following DHPC at the mean age of 10.5 years, however compensatory articulation errors persisted. Sequential treatments such as speech therapy play a key role in improvement of speech and may reduce remaining compensatory mechanisms following DHPC.
https://doi.org/10.1055/s-0044-1787002 인용 PDF

Search Result 613, Processing Time 0.065 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)