• Title/Summary/Keyword: Speech signals

Search Result 498, Processing Time 0.026 seconds

On an Adaptation of Announcement Sound Level in White Noise Environment (백색소음 환경에서 음성안내레벨 적응에 관한 연구)

  • Yun, Jong-Jin;Bae, Myung-Jin
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.1
    • /
    • pp.112-118
    • /
    • 2012
  • In daily life, there are many information broadcasting by using voice information systems. If surrounding noises are mixed with the information signals, the clarity of the signal become down graded too much to understand. Surrounding noises are not uniformed, but very irregular signals always changing. Therefore, it is very hard to control the output signals along with the irregular signals. This paper suggests a method to change the level of the voice information adapting to the surround noise in the white noise environment. The surround noise level is measured by subtracting the stored output voice signal from the voice signal degraded by the noise. The noise is used to estimation of SNR. And, the method to change the output level of voice signal adapting to the noise level. The suggested adaptive voice information system has the advantage to improve listeners' speech perception and to use amplifier's energy effectively.

Design of Wideband Speech Coder Using the MLT Residual Signal (MLT 여기신호를 이용한 광대역 음성 부호화기 설계)

  • Oh Yeon-Seon;Shin Jae-Hyun;Lee In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.5
    • /
    • pp.248-254
    • /
    • 2005
  • In this Paper, the structure of a split bandwidth wideband speech coder and its highband coder for tone qualify elevation are Proposed. The lowband and highband by the split bandwidth method are encoded independently applying the G.729E and MLT (Modulated Lapped Transform) residual model. In the highband structure which is encoded by low bit rate of 4kbps, the MLT residual signals are distinguished to voice and unvoice signal . The voice signals are applied to MLT peak picking method by lowband pitch period. Because transformed MLT residual signals are represented by periodic signal that have periodic peak. The unvoice signals are applied to MLT which linear prediction spectral response is added and do vector quantization. Performance for proposed 15.8kbps wideband speech coder was verified through subjective listening test.

Active Noise Cancelling Headphone using Adaptive Beam-forming Techniques (적응 빔 포밍 기법을 적용한 능동 소음 제거 헤드폰)

  • Moon, Sung-Kyu;Nam, Hyun-Do
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.9
    • /
    • pp.1699-1701
    • /
    • 2010
  • The Active Noise Cancelling(ANC) headphone is now becoming commercially available. But it reduces not only noise but also information signals such as speech or some signals including audible information in particular situation. In this paper, we propose an ANC headphone using adaptive beam-forming techniques which cancels signals except the headphone wearer's look direction signal. It enables workers working in noisy condition to talk with their coworkers. Computer simulation is performed to show the effectiveness of a proposed algorithm.

Two-Microphone Binary Mask Speech Enhancement in Diffuse and Directional Noise Fields

  • Abdipour, Roohollah;Akbari, Ahmad;Rahmani, Mohsen
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.772-782
    • /
    • 2014
  • Two-microphone binary mask speech enhancement (2mBMSE) has been of particular interest in recent literature and has shown promising results. Current 2mBMSE systems rely on spatial cues of speech and noise sources. Although these cues are helpful for directional noise sources, they lose their efficiency in diffuse noise fields. We propose a new system that is effective in both directional and diffuse noise conditions. The system exploits two features. The first determines whether a given time-frequency (T-F) unit of the input spectrum is dominated by a diffuse or directional source. A diffuse signal is certainly a noise signal, but a directional signal could correspond to a noise or speech source. The second feature discriminates between T-F units dominated by speech or directional noise signals. Speech enhancement is performed using a binary mask, calculated based on the proposed features. In both directional and diffuse noise fields, the proposed system segregates speech T-F units with hit rates above 85%. It outperforms previous solutions in terms of signal-to-noise ratio and perceptual evaluation of speech quality improvement, especially in diffuse noise conditions.

A study on the interactive speech recognition mobile robot (대화형 음성인식 이동로봇에 관한 연구)

  • 이재영;윤석현;홍광석
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.11
    • /
    • pp.97-105
    • /
    • 1996
  • This paper is a study on the implementation of speech recognition mobile robot to which the interactive speech recognition techniques is applied. The speech command uttered the sentential connected word and is asserted through the wireless mic system. This speech signal transferred LPC-cepstrum and shorttime energy which are computed from the received signal on the DSP board to notebook PC. In notebook PC, DP matching technique is used for recognizer and the recognition results are transferred to the motor control unit which output pulse signals corresponding to the recognized command and drive the stepping motor. Grammar network applied to reduce the recognition speed of the recogniger, so that real time recognition is realized. The misrecognized command is revised by interface revision through the conversation with mobile robot. Therefore, user can move the mobile robot to the direction which user wants.

  • PDF

Statistical Extraction of Speech Features Using Independent Component Analysis and Its Application to Speaker Identification

  • Jang, Gil-Jin;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4E
    • /
    • pp.156-163
    • /
    • 2002
  • We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for representing speech signals of a given speaker The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by adapting the basis functions so that each source component is statistically independent. The learned basis functions are oriented and localized in both space and frequency, bearing a resemblance to Gabor wavelets. These features are speaker dependent characteristics and to assess their efficiency we performed speaker identification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features in that they can obtain a higher speaker identification rate.

Analysis of Speech Signals According to the Various Emotional Contents (정서정보의 변화에 따른 음성신호의 특성분석에 관한 연구)

  • Jo, Cheol-Woo;Jo, Eun-Kyung;Min, Kyung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3
    • /
    • pp.33-37
    • /
    • 1997
  • This paper describes experimental results from emotional speech materials, which is analysed by various signal processing methods. Speech materials with emotional informations are collected from actors. Analysis is focused to the variations of pitch informations and durations. From the analysed results we can observe the characteristics of emotional speech. The materials from this experiment provides valuable resources for analysing emotional speech.

  • PDF

Speech Recognition by Neural Net Pattern Recognition Equations with Self-organization

  • Kim, Sung-Ill;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2E
    • /
    • pp.49-55
    • /
    • 2003
  • The modified neural net pattern recognition equations were attempted to apply to speech recognition. The proposed method has a dynamic process of self-organization that has been proved to be successful in recognizing a depth perception in stereoscopic vision. This study has shown that the process has also been useful in recognizing human speech. In the processing, input vocal signals are first compared with standard models to measure similarities that are then given to a process of self-organization in neural net equations. The competitive and cooperative processes are conducted among neighboring input similarities, so that only one winner neuron is finally detected. In a comparative study, it showed that the proposed neural networks outperformed the conventional HMM speech recognizer under the same conditions.

A Multimodal Emotion Recognition Using the Facial Image and Speech Signal

  • Go, Hyoun-Joo;Kim, Yong-Tae;Chun, Myung-Geun
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.1
    • /
    • pp.1-6
    • /
    • 2005
  • In this paper, we propose an emotion recognition method using the facial images and speech signals. Six basic emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. Facia] expression recognition is performed by using the multi-resolution analysis based on the discrete wavelet. Here, we obtain the feature vectors through the ICA(Independent Component Analysis). On the other hand, the emotion recognition from the speech signal method has a structure of performing the recognition algorithm independently for each wavelet subband and the final recognition is obtained from the multi-decision making scheme. After merging the facial and speech emotion recognition results, we obtained better performance than previous ones.

Improved Leakage Signal Blocking Methods for Two Channel Generalized Sidelobe Canceller

  • Kim, Ki-Hyeon;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.117-128
    • /
    • 2006
  • The two-channel Generalized Sidelobe Canceller (GSC) scheme suffers from the presence of leakage signal in the reference channel. The leakage signal is caused by the dissimilar impulse responses between microphones, and different paths from speech source to microphones. Such leakage is detrimental to speech enhancement of the GSC since the desired reference signal becomes corrupted. In order to suppress the signal leakage, two matrix injection methods are proposed. In the first method, a simple gain compensation matrix is used. In the second, a projection matrix for reducing the error between the actual and the ideal primary and reference signals, is used. This paper describes the performance degradation resulting from leakage, and proposes effective methods to resolve the problem. Representative experiments were conducted to demonstrate the effectiveness of the proposed methods on recorded speech and noise in an actual automobile environment.

  • PDF