Search | Korea Science

Recognition for Noisy Speech by a Nonstationary AR HMM with Gain Adaptation Under Unknown Noise (잡음하에서 이득 적응을 가지는 비정상상태 자기회귀 은닉 마코프 모델에 의한 오염된 음성을 위한 인식)

이기용;서창우;이주헌
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.1
- /
- pp.11-18
- /
- 2002
In this paper, a gain-adapted speech recognition method in noise is developed in the time domain. Noise is assumed to be colored. To cope with the notable nonstationary nature of speech signals such as fricative, glides, liquids, and transition region between phones, the nonstationary autoregressive (NAR) hidden Markov model (HMM) is used. The nonstationary AR process is represented by using polynomial functions with a linear combination of M known basis functions. When only noisy signals are available, the estimation problem of noise inevitably arises. By using multiple Kalman filters, the estimation of noise model and gain contour of speech is performed. Noise estimation of the proposed method can eliminate noise from noisy speech to get an enhanced speech signal. Compared to the conventional ARHMM with noise estimation, our proposed NAR-HMM with noise estimation improves the recognition performance about 2-3%.
PDF KSCI

Variable Threshold Detection with Weighted BPSK/PCM Speech Signal Transmitted over Gaussian Channels (가우시안 채널에 있어 가중치를 부여한 BPSK/PCM 음성신호의 비트거물 한계치 변화에 의한 신호재생)

안승춘;서정욱;이문호
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.24 no.5
- /
- pp.733-739
- /
- 1987
In this paper, variable threshold detection with weighted pulse code modulation-encoded signals transmitted over Gaussian channels has been investigated. Each bit in the \ulcornerlaw PCM word is weighted according to its significance in the transmitter. It the output falls into the erasure zone, the regenerated sample replaced by interpolation or prediction. To overall system signal to noise ratio for BPSK/PCM speech signals of this technique has been found. When the input signal level was -17 db, the gains in overall signal s/n compared to weighted PCM and variable threshold detection were 5 db and 3 db, respectively. Computer simulation was performed generating signals by computer. The simulation was in resonable agreement with our theoretical prediction.
PDF

Implementation of Sound Source Localization Based on Audio-visual Information for Humanoid Robots (휴모노이드 로봇을 위한 시청각 정보 기반 음원 정위 시스템 구현)

Park, Jeong-Ok;Na, Seung-You;Kim, Jin-Young
- Speech Sciences
- /
- v.11 no.4
- /
- pp.29-42
- /
- 2004
This paper presents an implementation of real-time speaker localization using audio-visual information. Four channels of microphone signals are processed to detect vertical as well as horizontal speaker positions. At first short-time average magnitude difference function(AMDF) signals are used to determine whether the microphone signals are human voices or not. And then the orientation and distance information of the sound sources can be obtained through interaural time difference. Finally visual information by a camera helps get finer tuning of the angles to speaker. Experimental results of the real-time localization system show that the performance improves to 99.6% compared to the rate of 88.8% when only the audio information is used.
PDF

An acoustic echo canceler robust to noisy environment (잡음환경에 강건한 음향반향제거기)

박장식;손경식
- Proceedings of the IEEK Conference
- /
- 1998.06a
- /
- pp.623-626
- /
- 1998
NLMS algorithm is degraded by the ambient noises and the near-end speech signals. In this paper, a robust acoustic echo cancellation algorithm is proposed. To enhance the echo cancellation performance, the step size of the proposed algorithm is normalized by the sum o fthe power of the reference signals and the primary signals. as results of comparing the excess mean square errors, it is shown that the proosed algorithm can enhance the performance of cancelling the echo signals. Some experiments, which is used multimedia personal computer, are carried out. As results of experiments, the proposed algorithm shows better performance than conventional ones.
PDF

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
- Phonetics and Speech Sciences
- /
- v.2 no.3
- /
- pp.141-148
- /
- 2010
In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.
PDF

Real-Time Implementation of Acoustic Echo Canceller Using TMS320C6711 DSK

Heo, Won-Chul;Bae, Keun-Sung
- Speech Sciences
- /
- v.15 no.1
- /
- pp.75-83
- /
- 2008
The interior of an automobile is a very noisy environment with both stationary cruising noise and the reverberated music or speech coming out from the audio system. For robust speech recognition in a car environment, it is necessary to extract a driver's voice command well by removing those background noises. Since we can handle the music and speech signals from an audio system in a car, the reverberated music and speech sounds can be removed using an acoustic echo canceller. In this paper, we implement an acoustic echo canceller with robust double-talk detection algorithm using TMS-320C6711 DSK. First we developed the echo canceller on the PC for verifying the performance of echo cancellation, then implemented it on the TMS320C6711 DSK. For processing of one speech sample with 8kHz sampling rate and 256 filter taps of the echo canceller, the implemented system used only 0.035ms and achieved the ERLE of 20.73dB.
PDF

Fast Speaker Adaptation Based on Eigenspace-based MLLR Using Artificially Distorted Speech in Car Noise Environment (차량 잡음 환경에서 인위적 왜곡 음성을 이용한 Eigenspace-based MLLR에 기반한 고속 화자 적응)

Song, Hwa-Jeon;Jeon, Hyung-Bae;Kim, Hyung-Soon
- Phonetics and Speech Sciences
- /
- v.1 no.4
- /
- pp.119-125
- /
- 2009
This paper proposes fast speaker adaptation method using artificially distorted speech in telematics terminal under the car noise environment based on eigenspace-based maximum likelihood linear regression (ES-MLLR). The artificially distorted speech is built from adding the various car noise signals collected from a driving car to the speech signal collected from an idling car. Then, in every environment, the transformation matrix is estimated by ES-MLLR using the artificially distorted speech corresponding to the specific noise environment. In test mode, an online model is built by weighted sum of the environment transformation matrices depending on the driving condition. In 3k-word recognition task in the telematics terminal, we achieve a performance superior to ES-MLLR even using the adaptation data collected from the driving condition.
PDF

A Comparison of Effective Feature Vectors for Speech Emotion Recognition (음성신호기반의 감정인식의 특징 벡터 비교)

Shin, Bo-Ra;Lee, Soek-Pil
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.67 no.10
- /
- pp.1364-1369
- /
- 2018
Speech emotion recognition, which aims to classify speaker's emotional states through speech signals, is one of the essential tasks for making Human-machine interaction (HMI) more natural and realistic. Voice expressions are one of the main information channels in interpersonal communication. However, existing speech emotion recognition technology has not achieved satisfactory performances, probably because of the lack of effective emotion-related features. This paper provides a survey on various features used for speech emotional recognition and discusses which features or which combinations of the features are valuable and meaningful for the emotional recognition classification. The main aim of this paper is to discuss and compare various approaches used for feature extraction and to propose a basis for extracting useful features in order to improve SER performance.
https://doi.org/10.5370/KIEE.2018.67.10.1364 인용 PDF KSCI

Comparison & Analysis of Speech/Music Discrimination Features through Experiments (실험에 의한 음성·음악 분류 특징의 비교 분석)

Lee, Kyung-Rok;Ryu, Shi-Woo;Gwark, Jae-Young
- Proceedings of the Korea Contents Association Conference
- /
- 2004.11a
- /
- pp.308-313
- /
- 2004
In this paper, we compared and analyzed the discrimination performance of speech/music about combinations of each features parameter. Audio signals are classified into 3 classes (speech, music, speech and music). On three types of features, Mel-cepstrum, energy, zero-crossings used to the experiments. Then compared and analyzed the best of the combinations between features to speech/ music discrimination performance. The best result is achieved using Mel-cepstrum, energy and zero-crossings in a single feature vector (speech: 95.1%, music: 61.9%, speech & music: 55.5%).
PDF

Performance Evaluation of Novel AMDF-Based Pitch Detection Scheme

Kumar, Sandeep
- ETRI Journal
- /
- v.38 no.3
- /
- pp.425-434
- /
- 2016
A novel average magnitude difference function (AMDF)-based pitch detection scheme (PDS) is proposed to achieve better performance in speech quality. A performance evaluation of the proposed PDS is carried out through both a simulation and a real-time implementation of a speech analysis-synthesis system. The parameters used to compare the performance of the proposed PDS with that of PDSs that are based on either a cepstrum, an autocorrelation function (ACF), an AMDF, or circular AMDF (CAMDF) methods are as follows: percentage gross pitch error (%GPE); a subjective listening test; an objective speech quality assessment; a speech intelligibility test; a synthesized speech waveform; computation time; and memory consumption. The proposed PDS results in lower %GPE and better synthesized speech quality and intelligibility for different speech signals as compared to the cepstrum-, ACF-, AMDF-, and CAMDF-based PDSs. The computational time of the proposed PDS is also less than that for the cepstrum-, ACF-, and CAMDF-based PDSs. Moreover, the total memory consumed by the proposed PDS is less than that for the ACF- and cepstrum-based PDSs.
https://doi.org/10.4218/etrij.16.0115.0926 인용 PDF KSCI

Search Result 499, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)