• Title/Summary/Keyword: speech signals

Search Result 497, Processing Time 0.021 seconds

HMM-Based Automatic Speech Recognition using EMG Signal

  • Lee Ki-Seung
    • Journal of Biomedical Engineering Research
    • /
    • v.27 no.3
    • /
    • pp.101-109
    • /
    • 2006
  • It has been known that there is strong relationship between human voices and the movements of the articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The EMG signals were acquired from three articulatory facial muscles. Preliminary, 10 Korean digits were used as recognition variables. The various feature parameters including filter bank outputs, linear predictive coefficients and cepstrum coefficients were evaluated to find the appropriate parameters for EMG-based speech recognition. The sequence of the EMG signals for each word is modelled by a hidden Markov model (HMM) framework. A continuous word recognition approach was investigated in this work. Hence, the model for each word is obtained by concatenating the subword models and the embedded re-estimation techniques were employed in the training stage. The findings indicate that such a system may have a capacity to recognize speech signals with an accuracy of up to 90%, in case when mel-filter bank output was used as the feature parameters for recognition.

A Study on Speech Separation in Cochannel using Sinusoidal Model (Sinusoidal Model을 이용한 Cochannel상에서의 음성분리에 관한 연구)

  • Park, Hyun-Gyu;Shin, Joong-In;Park, Sang-Hee
    • Proceedings of the KIEE Conference
    • /
    • 1997.11a
    • /
    • pp.597-599
    • /
    • 1997
  • Cochannel speaker separation is employed when speech from two talkers has been summed into one signal and it is desirable to recover one or both of the speech signals from the composite signal. Cochannel speech occurs in many common situations such as when two AM signals containing speech are transmitted on the same frequency or when two people are speaking simultaneously (e. g., when talking on the telephone). In this paper, the method that separated the speech in such a situation is proposed. Especially, only the voiced sound of few sound states is separated. And the similarity of the signals by the cross correlation between the signals for exactness of original signal and separated signal is proved.

  • PDF

An Explicit Voiced Speech Classification by using the Fluctuation of Maximum Magitudes (최대진폭의 Fluctuation에 의한 유성음구간 Explicit 검출)

  • 배명진
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1987.11a
    • /
    • pp.86-88
    • /
    • 1987
  • Accurate detection of the voicved segment in speech signals is important for robust pitch extraction. This paper describes an explicit detection algorithmfor detecting the voiced segment in speech signals. Thsi algoithm is based on the fluctuation properties of maximum magnitudes in each frame of speech signals. The performance of this detector is evaluated and compared to that obtained from manually classifying 150 recorded digit utterances.

  • PDF

HMM Based Endpoint Detection for Speech Signals

  • Lee Yonghyung;Oh Changhyuck
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.75-76
    • /
    • 2001
  • An endpoint detection method for speech signals utilizing hidden Markov model(HMM) is proposed. It turns out that the proposed algorithm is quite satisfactory to apply isolated word speech recognition.

  • PDF

Feature Parameter Extraction and Analysis in the Wavelet Domain for Discrimination of Music and Speech (음악과 음성 판별을 위한 웨이브렛 영역에서의 특징 파라미터)

  • Kim, Jung-Min;Bae, Keun-Sung
    • MALSORI
    • /
    • no.61
    • /
    • pp.63-74
    • /
    • 2007
  • Discrimination of music and speech from the multimedia signal is an important task in audio coding and broadcast monitoring systems. This paper deals with the problem of feature parameter extraction for discrimination of music and speech. The wavelet transform is a multi-resolution analysis method that is useful for analysis of temporal and spectral properties of non-stationary signals such as speech and audio signals. We propose new feature parameters extracted from the wavelet transformed signal for discrimination of music and speech. First, wavelet coefficients are obtained on the frame-by-frame basis. The analysis frame size is set to 20 ms. A parameter $E_{sum}$ is then defined by adding the difference of magnitude between adjacent wavelet coefficients in each scale. The maximum and minimum values of $E_{sum}$ for period of 2 seconds, which corresponds to the discrimination duration, are used as feature parameters for discrimination of music and speech. To evaluate the performance of the proposed feature parameters for music and speech discrimination, the accuracy of music and speech discrimination is measured for various types of music and speech signals. In the experiment every 2-second data is discriminated as music or speech, and about 93% of music and speech segments have been successfully detected.

  • PDF

A Study on Speech Separation using Sinusoidal Model and Psycoacoustics Model (정현파 모델과 사이코어쿠스틱스 모델을 이용한 음성 분리에 관한 연구)

  • Hwang, Sun-Il;Han, Doo-Jin;Kwon, Chul-Hyun;Shin, Dae-Kyu;Park, Sang-Hui
    • Proceedings of the KIEE Conference
    • /
    • 2001.07d
    • /
    • pp.2622-2624
    • /
    • 2001
  • In this thesis, speaker separation is employed when speech from two talkers has been summed into one signal and it is desirable to recover one or both of the speech signals from the composite signal. This paper proposed the method that separated the summed speeches and proved the similarity between the signals by the cross correlation between the signals for exact between original signal and separated signal. This paper uses frequency sampling method based on sinusoidal model to separate the composite signal with vocalic speech and vocalic speech and noise masking method based on psycoacoustics model to separate the composite signal with vocalic speech and nonvocalic speech.

  • PDF

An acoustic Doppler-based silent speech interface technology using generative adversarial networks (생성적 적대 신경망을 이용한 음향 도플러 기반 무 음성 대화기술)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.161-168
    • /
    • 2021
  • In this paper, a Silent Speech Interface (SSI) technology was proposed in which Doppler frequency shifts of the reflected signal were used to synthesize the speech signals when 40kHz ultrasonic signal was incident to speaker's mouth region. In SSI, the mapping rules from the features derived from non-speech signals to those from audible speech signals was constructed, the speech signals are synthesized from non-speech signals using the constructed mapping rules. The mapping rules were built by minimizing the overall errors between the estimated and true speech parameters in the conventional SSI methods. In the present study, the mapping rules were constructed so that the distribution of the estimated parameters is similar to that of the true parameters by using Generative Adversarial Networks (GAN). The experimental result using 60 Korean words showed that, both objectively and subjectively, the performance of the proposed method was superior to that of the conventional neural networks-based methods.

Classification of Signals Segregated using ICA (ICA로 분리한 신호의 분류)

  • Kim, Seon-Il
    • 전자공학회논문지 IE
    • /
    • v.47 no.4
    • /
    • pp.10-17
    • /
    • 2010
  • There is no general method to find out from signals of the channel outputs of ICA(Independent Component Analysis) which is what you want. Assuming speech signals contaminated with the sound from the muffler of a car, this paper presents the method which shows what you want, It is anticipated that speech signals will show larger correlation coefficients for speech signals than others. Batch, maximum and average method were proposed using 'ah', 'oh', 'woo' vowels whose signals were spoken by the same person who spoke the speech signals and using the same vowels whose signals are by another person. With the correlation coefficients which were calculated for each vowel, voting and summation methods were added. This paper shows what the best is among several methods tried.

Distribution of the Slopes of Autocovariances of Speech Signals in Frequency Bands (음성 신호의 주파수 대역별 자기 공분산 기울기 분포)

  • Kim, Seonil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1076-1082
    • /
    • 2013
  • The frequency bands were discovered which maximize the slopes of autocovariances of speech signals in frequency domain to increase the possibility of segregation between speech signals and background noise signal. A speech signal is divided into blocks which include multiples of sampled data, then those blocks are transformed to frequency domain using Fast Fourier Transform(FFT). To find linear equation by Linear Regression, the coefficients of autocovariance within blocks of some frequency band are used. The slope of the linear equation which is called the slope of autocovariance is varied from band to band according to the characteristics of the speech signal. Using speech signals of a man which consist of 200 files, the coefficients of the slopes of autocovariances are analyzed and compared from band to band.

Robust Speech Detection Using the AURORA Front-End Noise Reduction Algorithm under Telephone Channel Environments (AURORA 잡음 처리 알고리즘을 이용한 전화망 환경에서의 강인한 음성 검출)

  • Suh Youngjoo;Ji Mikyong;Kim Hoi-Rin
    • MALSORI
    • /
    • no.48
    • /
    • pp.155-173
    • /
    • 2003
  • This paper proposes a noise reduction-based speech detection method under telephone channel environments. We adopt the AURORA front-end noise reduction algorithm based on the two-stage mel-warped Wiener filter approach as a preprocessor for the frequency domain speech detector. The speech detector utilizes mel filter-bank based useful band energies as its feature parameters. The preprocessor firstly removes the adverse noise components on the incoming noisy speech signals and the speech detector at the next stage detects proper speech regions for the noise-reduced speech signals. Experimental results show that the proposed noise reduction-based speech detection method is very effective in improving not only the performance of the speech detector but also that of the subsequent speech recognizer.

  • PDF