• Title/Summary/Keyword: Speech signals

Search Result 499, Processing Time 0.024 seconds

Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

  • Jeong, Yue Ri;Choi, Seung Ho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.220-225
    • /
    • 2020
  • This paper investigates the non-intrusive speech intelligibility estimation method in noise environments when the bottleneck feature of autoencoder is used as an input to a neural network. The bottleneck feature-based method has the problem of severe performance degradation when the noise environment is changed. In order to overcome this problem, we propose a novel non-intrusive speech intelligibility estimation method that adds the noise environment information along with bottleneck feature to the input of long short-term memory (LSTM) neural network whose output is a short-time objective intelligence (STOI) score that is a standard tool for measuring intrusive speech intelligibility with reference speech signals. From the experiments in various noise environments, the proposed method showed improved performance when the noise environment is same. In particular, the performance was significant improved compared to that of the conventional methods in different environments. Therefore, we can conclude that the method proposed in this paper can be successfully used for estimating non-intrusive speech intelligibility in various noise environments.

Performance Evaluation of Cochlear Implants Speech Processing Strategy Using Neural Spike Train Decoding (Neural Spike Train Decoding에 기반한 인공와우 어음처리방식 성능평가)

  • Kim, Doo-Hee;Kim, Jin-Ho;Kim, Kyung-Hwan
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.2
    • /
    • pp.271-279
    • /
    • 2007
  • We suggest a novel method for the evaluation of cochlear implant (CI) speech processing strategy based on neural spike train decoding. From formant trajectories of input speech and auditory nerve responses responding to the electrical pulse trains generated from a specific CI speech processing strategy, optimal linear decoding filter was obtained, and used to estimate formant trajectory of incoming speech. Performance of a specific strategy is evaluated by comparing true and estimated formant trajectories. We compared a newly-developed strategy rooted from a closer mimicking of auditory periphery using nonlinear time-varying filter, with a conventional linear-filter-based strategy. It was shown that the formant trajectories could be estimated more exactly in the case of the nonlinear time-varying strategy. The superiority was more prominent when background noise level is high, and the spectral characteristic of the background noise was close to that of speech signals. This confirms the superiority observed from other evaluation methods, such as acoustic simulation and spectral analysis.

Hoarse Speech Analysis Using Dissymmetric Four-Mass Model of Vocal Cords (비대칭 4 질량 성대 모델에 의한 쉰목소리 분석)

  • Jiang, Gan-Yi;Chen, Hui-Fang;Choi, Tae-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.94-101
    • /
    • 1995
  • In this paper, a new vocal cords model, called a four-mass model, is proposed for a hoarse speech mechanism. Pathological changes of vocal cords cause hoarse speech and glottal waveform reflects motion states of vocal cords. From these facts, we assumed that the morbid vocal cords be dissymmetric and take the four-mass type. The glottal waveforms and the model parameters of normal and hoarse speech signals are analyzed, and some relations bet ween the model parameters and the hoarse pathology are discussed. Experimental results show that the new research method of hoarse speech can reveal relations between the acoustic features of hoarse speech and the hoarse pathology, and be used to diagnose laryngeal diseases and to improve tone quality of hoarse speech.

  • PDF

A New Hearing Aid Algorithm for Speech Discrimination using ICA and Multi-band Loudness Compensation

  • Lee Sangmin;Won Jong Ho;Park Hyung Min;Hong Sung Hwa;Kim In Young;Kim Sun I.
    • Journal of Biomedical Engineering Research
    • /
    • v.26 no.3
    • /
    • pp.177-184
    • /
    • 2005
  • In this paper, we proposed a new hearing aid algorithm to improve SNR(signal to noise ratio) of noisy speech signal and speech perception. The proposed hearing aid algorithm is a multi-band loudness compensation based independent component analysis (ICA). The proposed algorithm was compared with a conventional spectral subtraction algorithm on behind-the-ear type hearing aid. The proposed algorithm successfully separated a target speech signal from background noise and from a mixture of the speech signals. The algorithms were compared each other by means of SNR. The average improvement of SNR by ICA based algorithm was 16.64dB, whereas spectral subtraction algorithm was 8.67dB. From the clinical tests, we concluded that our proposed algorithm would help hearing aid user to hear clearly a target speech in noisy conditions.

Speech Enhancement based on Smoothed Global Soft Decision (Smoothed Global Soft Decision에 근거한 음성 향상 기법)

  • Jo, Q-Haing;Park, Yun-Sik;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.6
    • /
    • pp.118-123
    • /
    • 2007
  • In this paper, we propose an improved global soft decision for speech enhancement in noise environments. From an examination of statistical model-based speech enhancement, it is shown that the global soft decision has a fundamental drawback at the offset region of speech signals. To overcome the drawback, we apply a new speech enhancement method based on a smoothed Global likelihood ratio to the global soft decision. Performances of the proposed method are evaluated by subjective tests under various environments and yield better results compared with the reported speech enhancement method.

Hands-free Speech Recognition based on Echo Canceller and MAP Estimation (에코제거기와 MAP 추정에 기초한 핸즈프리 음성 인식)

  • Sung-ill Kim;Wee-jae Shin
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.3
    • /
    • pp.15-20
    • /
    • 2003
  • For some applications such as teleconference or telecommunication systems using a distant-talking hands-free microphone, the near-end speech signals to be transmitted is disturbed by an ambient noise and by an echo which is due to the coupling between the microphone and the loudspeaker. Furthermore, the environmental noise including channel distortion or additive noise is assumed to affect the original input speech. In the present paper, a new approach using echo canceller and maximum a posteriori(MAP) estimation is introduced to improve the accuracy of hands-free speech recognition. In this approach, it was shown that the proposed system was effective for hands-free speech recognition in ambient noise environment including echo. The experimental results also showed that the combination system between echo canceller and MAP environmental adaptation technique were well adapted to echo and noise environment.

  • PDF

Segaration of Corrupted Speech Signals using Canonical Correlation Analysis (정준 상관 분석을 이용한 잡음 섞인 음성 신호의 분리)

  • Kim, Seon-Il
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.05a
    • /
    • pp.164-167
    • /
    • 2012
  • The technology which is used for segregating voices signals from exhaust noise signals of a car is very practical one to realize the interfaces between men and machines using voices. The voice signals contaminated by exhaust noise signal of a car was separated by canonical correlation ananysis(CCA) in an environment which does not guarantee the independence between signals and have prior informations. Rearrangement for the input signals is important in CCA. CCA was studied and segragation between source signals were performed by CCA through rearrangements of each of signals. It is possible to apply the technique to various signals since it is also possible to use CCA to the signals which are not independent.

  • PDF

Wideband Speech Coding Algorithm with Application of Wavelet Transform (웨이브렛 변환을 적용한 광대역 음성부호화 알고리즘)

  • 이승원;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.462-470
    • /
    • 2002
  • Wideband speech, characterized by a bandwidth of 50∼7000 ㎐, sounds more natural and intelligible, and is less tiring to listen to when compared to narrowband speech characterized by a bandwidth of 300∼3400 ㎐. Wideband speech coders, however, have not been as successful as the narrowband speech coders because of their higher bit rate. In this paper, we propose a new wideband speech coder which combines the European standard of a narrowband speech coder, i.e., GSM-EFR, and a transform coder using the discrete wavelet transform. The proposed wideband speech coder operates as follows input speech is first split into two subbands with equal bandwidth and the two subband signals are coded and decoded by each subband coder. A GSM-EFR is adopted as a lower subband coder and a subband coder with wavelet transformed speech is designed for a upper subband coder. The total bit rate of the proposed coder is 18.9kbps (12.2 kbps for lower band coder and 6.7 kbps for upper band coder), and informal listening test results have shown that the proposed coder has comparable speech quality to that of G.722 with 56 kbps.

On Improving the Effects of Varying the Window Length on Speech Energy Computation (음성 에너지계산에서 창함수-길이 변화영향의 개선에 관한 연구)

  • Bae, Myung-Jin;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.9 no.2
    • /
    • pp.34-41
    • /
    • 1990
  • The energy parameter is widely used in pre-processing of speech signals, because it represent the phoneme characteristics of well But, the energy parameter is affected by the window length during the extracting. Thus, in this paper, the window length effects are studied in detail, and we proposed a new energy extraction algorithm that reduces the length effects. The energy contours with this algorithm are well representing for the characteristics of speech phonemes. And the computations to implement the algorithm are only required one subtraction, one addition, and two comparison aperation per speech sample.

  • PDF