• Title/Summary/Keyword: Speech signals

Search Result 499, Processing Time 0.021 seconds

Implementation of Adaptive Noise Canceller with Instantaneous Gain (순시 이득을 이용한 적응잡음제거기 구현)

  • Lee, Jae-Kyun;Kim, Chun-Sik;Lee, Chae-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.8C
    • /
    • pp.756-763
    • /
    • 2009
  • The Least Mean Square (LMS) algorithm is often used to restore signal corrupted by additive noise. A major defect of this algorithm is that the excess Mean Square Error (EMSE) increases linearly according to speech signal power. This result reduces the efficiency of performance significantly due to the large EMSE around the optimum value. Choosing a small step size solves this defect but causes a slow rate of convergence. The step size must be optimized to satisfy a fast rate of convergence and minimize EMSE. In this paper, the Instantaneous Gain Control (IGC) algorithm is proposed to deal with the situation as it exists in speech signals. Simulations were carried out using a real speech signal combined with Gaussian white noise. Results demonstrate the superiority of the proposed IGC algorithm over the LMS algorithm in rate of convergence, noise reduction and EMSE.

On the Importance of Tonal Features for Speech Emotion Recognition (음성 감정인식에서의 톤 정보의 중요성 연구)

  • Lee, Jung-In;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.713-721
    • /
    • 2013
  • This paper describes an efficiency of chroma based tonal features for speech emotion recognition. As the tonality caused by major or minor keys affects to the perception of musical mood, so the speech tonality affects the perception of the emotional states of spoken utterances. In order to justify this assertion with respect to tonality and emotion, subjective hearing tests are carried out by using synthesized signals generated from chroma features, and consequently show that the tonality contributes especially to the perception of the negative emotion such as anger and sad. In automatic emotion recognition tests, the modified chroma-based tonal features are shown to produce noticeable improvement of accuracy when they are supplemented to the conventional log-frequency power coefficient (LFPC)-based spectral features.

A New Analysis and a Reduction Method of Computational Complexity for the Lattice Transversal Joint (LTJ) Adaptive Filter (격자 트랜스버설 결합 (LTJ) 적응필터의 새로운 해석과 계산량 감소 방법)

  • 유재하
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.438-445
    • /
    • 2002
  • In this paper, the necessity of the filter coefficients compensation for the lattice transversal joint (LTJ) adaptive filter was explained in general and with ease by analyzing it with respect to the time-varying transform domain adaptive filter. And also the reduction method of computational complexity for filter coefficients compensation was proposed using the property that speech signal is stationary during a short time period and its effectiveness was verified through experiments using artificial and real speech signals. The proposed adaptive filter reduces the computational complexity for filter coefficients compensation by 95%, and when the filter is applied to the acoustic echo canceller with 1000 taps, the total complexity is reduced by 82%.

Electroencephalography-based imagined speech recognition using deep long short-term memory network

  • Agarwal, Prabhakar;Kumar, Sandeep
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.672-685
    • /
    • 2022
  • This article proposes a subject-independent application of brain-computer interfacing (BCI). A 32-channel Electroencephalography (EEG) device is used to measure imagined speech (SI) of four words (sos, stop, medicine, washroom) and one phrase (come-here) across 13 subjects. A deep long short-term memory (LSTM) network has been adopted to recognize the above signals in seven EEG frequency bands individually in nine major regions of the brain. The results show a maximum accuracy of 73.56% and a network prediction time (NPT) of 0.14 s which are superior to other state-of-the-art techniques in the literature. Our analysis reveals that the alpha band can recognize SI better than other EEG frequencies. To reinforce our findings, the above work has been compared by models based on the gated recurrent unit (GRU), convolutional neural network (CNN), and six conventional classifiers. The results show that the LSTM model has 46.86% more average accuracy in the alpha band and 74.54% less average NPT than CNN. The maximum accuracy of GRU was 8.34% less than the LSTM network. Deep networks performed better than traditional classifiers.

Construction of Customer Appeal Classification Model Based on Speech Recognition

  • Sheng Cao;Yaling Zhang;Shengping Yan;Xiaoxuan Qi;Yuling Li
    • Journal of Information Processing Systems
    • /
    • v.19 no.2
    • /
    • pp.258-266
    • /
    • 2023
  • Aiming at the problems of poor customer satisfaction and poor accuracy of customer classification, this paper proposes a customer classification model based on speech recognition. First, this paper analyzes the temporal data characteristics of customer demand data, identifies the influencing factors of customer demand behavior, and determines the process of feature extraction of customer voice signals. Then, the emotional association rules of customer demands are designed, and the classification model of customer demands is constructed through cluster analysis. Next, the Euclidean distance method is used to preprocess customer behavior data. The fuzzy clustering characteristics of customer demands are obtained by the fuzzy clustering method. Finally, on the basis of naive Bayesian algorithm, a customer demand classification model based on speech recognition is completed. Experimental results show that the proposed method improves the accuracy of the customer demand classification to more than 80%, and improves customer satisfaction to more than 90%. It solves the problems of poor customer satisfaction and low customer classification accuracy of the existing classification methods, which have practical application value.

The Vowel Length as a Function of the Articulatory Force of the Following Consonants in Korean

  • Kim, Dae-Won
    • Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.143-153
    • /
    • 2002
  • This study was designed to determine (1) the effects of the following stop consonant on the vowel length in isolated bi-syllabic words, (2) the mechanism which renders vowels longer in duration before lax stops than tense stops, (3) where the aspiratory interval is included, in the vowel portion or the preceding consonantal portion and (4) the influence of the preceding consonants upon the duration of the following vowel. Measurements were made of five timing variables on acoustic signals as three native Korean speakers uttered isolated bi-syllabic /VCV/ words in which the vowel was identical, /$\alpha$/, and the C slot was filled with bilabial stops. Findings: (1) the vowel length before the lax stops was significantly longer than before the tense stops, while the difference in the vowel duration between the tense stops was insignificant or negligible, (2) the vowel length varied as a function of the articulatory force of the following consonants, regardless of the phonological unit of syllable, (3) The aspiratory interval is interpreted as a portion of the preceding consonant and (4) The effects of the preceding consonants on the final vowel length were not rule-governed.

  • PDF

Enhanced Spectral Envelope Coding Scheme Using Inter-frame Correlation for G.729.1 (G.729.1 코더에서 프레임 간의 상호상관 관계를 이용한 개선된 스펙트럼 포락 코딩 방법)

  • Cho, Keun-Seok;Sung, Jong-Mo;Hahn, Min-Soo;Kim, Young-Il;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.97-103
    • /
    • 2009
  • This paper describes a new algorithm for encoding spectral envelope in the time domain alias cancellation (TDAC) part of G.729.1. The spectral envelope and modified discrete cosine transform (MDCT) coefficients of the weighted code-excited linear predictive (CELP) coding error in lower-band and the higher-band input signal are encoded in the TDAC part. In order to reduce allocation bits for spectral envelope coding, a new algorithm using sub-band correlation between adjacent frames is proposed. In addition, to improve the quality of decoded signals, two bit allocation strategies using reduced bits from the proposed algorithm are proposed. The performance of the proposed algorithm is evaluated in terms of objective quality and bit reduction rates. Experimental results show that the proposed algorithm increases the quality of sounds significantly.

  • PDF

Beamforming Optimization Using Filterbank-based Frost Algorithm (필터뱅크 기반 프로스트 알고리즘을 이용한 빔포밍 최적화)

  • Park, Ji-Hoon;Lee, Sung-Joo;Hong, Jeong-Pyo;Jeong, Sang-Bae;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.73-86
    • /
    • 2008
  • Beamforming is one of the spatial filtering techniques which extract only desired signals from noisy environments using microphone arrays. Fixed beamforming is a simple concept and easy to implement. However, it does not show good performance in real noisy conditions. As an adaptive beamforming, Frost algorithm can be a good candidate. It uses the concept of the linearly constrained minimum variance (LCMV) algorithm. The difference between the Frost and the LCMV algorithm is the error correction scheme which is very effective feature in the aspect of performance. In this paper, as quadrature mirror filtering (QMF)-based filterbank is utilized as the pre-processing of the Frost beamformning, the filter length and the learning rate of each band is optimized to improve the performance. The performance is measured by the signal-to-noise ratio (SNR) and the Bark's scale spectral distortion (BSD).

  • PDF

Acoustic Echo Canceller using Adaptive IIR Filters with Prewhitening Method and Variable Step-Size LMS Algorithm

  • Cho, Ju Pil;Hwng, Tae Jin;Baik, Heung Ki
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2E
    • /
    • pp.14-20
    • /
    • 1997
  • The future teleconferencing systems will need an appropriate system which controls properly the acoustic echo for the convenient communication. The conventional acoustic echo cancellation algorithms involve large adaptive filters identifying the impulse response of the echo path. The use of adaptive IIR filters appears to be a reasonable way to reduce computational complexity. Effective cancellation of acoustic echo presented in teleconferencing system requires that adaptive filters have a rapid convergence speed. One of the main problems of acoustic echo cancellation techniques is that the convergence properties degrade for an highly correlated signal input such as speech signals. By the way, the introduction of linear prediction filers onto the structure of the acoustic echo cancellation represents one approach to decorrelate the speech signal. And variable step-size LMS algorithm improves the convergence speed through a little increasing of computational complexity. In this paper, we applied these two methods to the acoustic echo canceller(AEC) and showed that these methods have better performances than the conventional AEC.

  • PDF

Sound Source Localization using Acoustically Shadowed Microphones (가려진 마이크로폰을 이용한 음원 위치 추적)

  • Lee, Hyeop-Woo;Yook, Dong-Suk
    • Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.17-28
    • /
    • 2008
  • In many practical applications of robots, finding the location of an incoming sound is an important issue for the development of efficient human robot interface. Most sound source localization algorithms make use of only those microphones that are acoustically visible from the sound source or do not take into account the effect of sound diffraction, thereby degrading the sound source localization performance. This paper proposes a new sound source localization method that can utilize those microphones that are acoustically shadowed from the sound source. The experiment results show that use of the acoustically shadowed microphones, which receive higher signal-to-noise ratio signals than the others and are closer to the sound source, improves the performance of sound source localization.

  • PDF