• Title/Summary/Keyword: Speech signals

Search Result 497, Processing Time 0.029 seconds

Noisy Speech Enhancement by Restoration of DFT Components Using Neural Network (신경회로망을 이용한 DFT 성분 복원에 의한 음성강조)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.5
    • /
    • pp.1078-1084
    • /
    • 2010
  • This paper presents a speech enhancement system which restores the amplitude components and phase components by discrete Fourier transform (DFT), using neural network training by back-propagation algorithm. First, a neural network is trained using DFT amplitude components and phase components of noisy speech signal, then the proposed system enhances speech signals that are degraded by white noise using a neural network. Experimental results demonstrate that speech signals degraded by white noise are enhanced by the proposed system using the neural network, whose inputs are DFT amplitude components and phase components. Based on measuring spectral distortion measurement, experiments confirm that the proposed system is effective for white noise.

A Study on Noise-Robust Methods for Broadcast News Speech Recognition (방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰)

  • Chung Yong-joo
    • MALSORI
    • /
    • no.50
    • /
    • pp.71-83
    • /
    • 2004
  • Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.

  • PDF

Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech (HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석)

  • Lin, Cang-Song;Bae, Keun-Sung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.71-75
    • /
    • 2010
  • The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.

  • PDF

Extraction of Unvoiced Consonant Regions from Fluent Korean Speech in Noisy Environments (잡음환경에서 우리말 연속음성의 무성자음 구간 추출 방법)

  • 박정임;하동경;신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.286-292
    • /
    • 2003
  • Voice activity detection (VAD) is a process that separates the noise region from silence or noise region of input speech signal. Since unvoiced consonant signals have very similar characteristics to those of noise signals, it may result in serious distortion of unvoiced consonants, or in erroneous noise estimation to can out VAD without paying special attention on unvoiced consonants. In this paper, we propose a method to extract in an explicit way the boundaries between unvoiced consonant and noise in fluent speech so that more exact VAD could be performed. The proposed method is based on histogram in frequency domain which was successfully used by Hirsch for noise estimation, and a1so on similarity measure of frequency components between adjacent frames, To evaluate the performance of the proposed method, experiments on unvoiced consonant boundary extraction was performed on seven kinds of noisy speech signals of 10 ㏈ and 15 ㏈ SNR respectively.

A New EGG System Design and Speech Analysis for Quantitative Analysis of Human Glottal Vibration Patterns (성문진동 패턴의 정량적인 해석을 위한 새로운 시스템 설계와 음성분석)

  • 김종찬;이재천;김덕원;오명환;윤대희;차일환
    • Journal of Biomedical Engineering Research
    • /
    • v.20 no.4
    • /
    • pp.427-433
    • /
    • 1999
  • The purpose of the study is to develop an improved pitch extraction method that can be used in a variety of speech applications such as high-puality compression and vocoding, and recognition and synthesis of speech. To do so, we develop a new electroglottograph (EGG) measurement system that is based on the four modulation-demodulation type spot electrodes for detecting the EGG signals. Then, the glottal closure instant(GCI) is determined from the EGG signals on a real-time basis. We can obtain the pitch contour using the information on the GCI. It turns out that the new pitch contour algorithm (PCA) operates more reliably as compared to the conventional speech-only-based algorithm. In addition, we study the speech source models and glottal vibratory patterns for Koreans by measuring and analyzing the diversified vibration patterns of the vocal from the EGG signals.

  • PDF

A Study on the Robust Pitch Period Detection Algorithm in Noisy Environments (소음환경에 강인한 피치주기 검출 알고리즘에 관한 연구)

  • Seo Hyun-Soo;Bae Sang-Bum;Kim Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.481-484
    • /
    • 2006
  • Pitch period detection algorithms are applied to various speech signal processing fields such as speech recognition, speaker identification, speech analysis and synthesis. Furthermore, many pitch detection algorithms of time and frequency domain have been studied until now. AMDF(average magnitude difference function) ,which is one of pitch period detection algorithms, chooses a time interval from the valley point to the valley point as the pitch period. AMDF has a fast computation capacity, but in selection of valley point to detect pitch period, complexity of the algorithm is increased. In order to apply pitch period detection algorithms to the real world, they have robust prosperities against generated noise in the subway environment etc. In this paper we proposed the modified AMDF algorithm which detects the global minimum valley point as the pitch period of speech signals and used speech signals of noisy environments as test signals.

  • PDF

Fatigue Classification Model Based On Machine Learning Using Speech Signals (음성신호를 이용한 기계학습 기반 피로도 분류 모델)

  • Lee, Soo Hwa;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.741-747
    • /
    • 2022
  • Fatigue lowers an individual's ability and makes it difficult to perform work. As fatigue accumulates, concentration decreases and thus the possibility of causing a safety accident increases. Awareness of fatigue is subjective, but it is necessary to quantitatively measure the level of fatigue in the actual field. In previous studies, it was proposed to measure the level of fatigue by expert judgment by adding objective indicators such as bio-signal analysis to subjective evaluations such as multidisciplinary fatigue scales. However this method is difficult to evaluate fatigue in real time in daily life. This paper is a study on the fatigue classification model that determines the fatigue level of workers in real time using speech data recorded in the field. Machine learning models such as logistic classification, support vector machine, and random forest are trained using speech data collected in the field. The performance evaluation showed good performance with accuracy of 0.677 to 0.758, of which logistic classification showed the best performance. From the experimental results, it can be seen that it is possible to classify the fatigue level using speech signals.

Method of a Multi-mode Low Rate Speech Coder Using a Transient Coding at the Rate of 2.4 kbit/s (전이구간 부호화를 이용한 2.4 kbit/s 다중모드 음성 부호화 방법)

  • Ahn Yeong-uk;Kim Jong-hak;Lee Insung;Kwon Oh-ju;Bae Mun-Kwan
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.2 s.302
    • /
    • pp.131-142
    • /
    • 2005
  • The low rate speech coders under 4 kbit/s are based on sinusoidal transform coding (STC) or multiband excitation (MBE). Since the harmonic coders are not efficient to reconstruct the transient segments of speech signals such as onsets, offsets, non-periodic signals, etc, the coders do not provide a natural speech quality. This paper proposes method of a efficient transient model :d a multi-mode low rate coder at 2.4 kbit/s that uses harmonic model for the voiced speech, stochastic model for the unvoiced speech and a model using aperiodic pulse location tracking (APPT) for the transient segments, respectively. The APPT utilizes the harmonic model. The proposed method uses different models depending on the characteristics of LPC residual signals. In addition, it can combine synthesized excitation in CELP coding at time domain with that in harmonic coding at frequency domain efficiently. The proposed coder shows a better speech quality than 2.4 kbit/s version of the mixed excitation linear prediction (MELP) coder that is a U.S. Federal Standard for speech coder.

RECOGNIZING SIX EMOTIONAL STATES USING SPEECH SIGNALS

  • Kang, Bong-Seok;Han, Chul-Hee;Youn, Dae-Hee;Lee, Chungyong
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.366-369
    • /
    • 2000
  • This paper examines three algorithms to recognize speaker's emotion using the speech signals. Target emotions are happiness, sadness, anger, fear, boredom and neutral state. MLB(Maximum-Likeligood Bayes), NN(Nearest Neighbor) and HMM (Hidden Markov Model) algorithms are used as the pattern matching techniques. In all cases, pitch and energy are used as the features. The feature vectors for MLB and NN are composed of pitch mean, pitch standard deviation, energy mean, energy standard deviation, etc. For HMM, vectors of delta pitch with delta-delta pitch and delta energy with delta-delta energy are used. We recorded a corpus of emotional speech data and performed the subjective evaluation for the data. The subjective recognition result was 56% and was compared with the classifiers' recognition rates. MLB, NN, and HMM classifiers achieved recognition rates of 68.9%, 69.3% and 89.1% respectively, for the speaker dependent, and context-independent classification.

  • PDF

A Study on Formant Variation with Drinking and Nondrinking Condition (음주와 비음주 상태의 포어먼트 변화에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.4
    • /
    • pp.805-810
    • /
    • 2009
  • This paper present a characteristic of formant variation in order to discriminate between drinking and nondrinking condition. By simulation experiments based on monosyllable, it is shown that the higher formant in F1, F2 and F3 in drinking speech signals compared with nondrinking speech signals. And I knew that the formant is very effective at distinction of drinking condition and nondrinking condition.