• Title/Summary/Keyword: Noise speech data

Search Result 144, Processing Time 0.028 seconds

Predictability effects on speech perception in noise (SPIN) in Korean (한국어 소음속말인지에 나타나는 예측성 효과)

  • Lee, Sun-Young
    • Korean Journal of Cognitive Science
    • /
    • v.27 no.1
    • /
    • pp.129-157
    • /
    • 2016
  • This study investigates speech perception in noise (SPIN) in Korean. A new type of Korean SPIN test was developed by adopting a similar format to the English SPIN test. The predictability effects, noise effects and their interactions were examined in order to verify the previous findings based on English. The data from 14 Korean adults collected with this new type of Korean SPIN test confirmed the previous findings: first, the participants' overall performance was better in low noise conditions than in high noise conditions. Secondly, there was a tendency for highly predictable words to be more accurately perceived than less predictable words especially in high noise conditions. The results were interpreted in such a way that the listeners actively used both types of information: acoustic information and contextual information in speech perception. When the acoustic property of the speech sound was degraded with noise, the listeners took advantage of the linguistic contextual information in their processing of the speech sound. The findings of this study conform to those of the previous studies based on the English SPIN test. In addition, a possible effect of the frequency of target word was also found, calling for further investigation in this field of research in Korean. Implications of the results were also discussed. (Cyber Hankuk University of Foreign Studies)

  • PDF

Evaluation of Speech Privacy on the Seat-design in High-speed Train Passenger Cars (KTX 의자 설계에 따른 객실 Speech Privacy 평가)

  • Jang, Hyung Suk;Kim, Jae Hyeon;Jeon, Jin Yong
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.24 no.2
    • /
    • pp.146-153
    • /
    • 2014
  • This study investigates the effects of seat-design elements such as seating arrangement, shape, and height on speech privacy in high-speed trains. For the evaluation of speech privacy, acoustic simulation software was used to reproduce room acoustical conditions in passenger cars on the basis of in-situ measurement data. The influences of speech source directivity and source height on privacy distance ($r_P$) were investigated, and it was found that $r_P$ determined using an omni-directional source was relatively shorter than that determined using a directional source. It was also found that $r_P$ decreased when the source height was lower than the height of the seat-back because the seat-back blocked the propagation of speech from the sound source. The effect of seating arrangement was not significant when comparing the vis-a-vis seating and one-side seating arrangements. In addition, among the alternative seat-designs, the seats that block the space between the seats and cover the space near the ear were found to show significantly enhanced speech privacy in high-speed train passenger cars.

Robust Speech Enhancement Using HMM and $H_\infty$ Filter (HMM과 $H_\infty$필터를 이용한 강인한 음성 향상)

  • 이기용;김준일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.7
    • /
    • pp.540-547
    • /
    • 2004
  • Since speech enhancement algorithms based on Kalman/Wiener filter require a priori knowledge of the noise and have focused on the minimization of the variance of the estimation error between clean and estimated speech signal, small estimation error on the noise statistics may lead to large estimation error. However, H/sub ∞/ filter does not require any assumptions and a priori knowledge of the noise statistics, but searches the best estimated signal among the entire estimated signal by applying least upper bound, consequently it is more robust to the variation of noise statistics than Kalman/Wiener filter. In this paper, we Propose a speech enhancement method using HMM and multi H/sub ∞/ filters. First, HMM parameters are estimated with the training data. Secondly, speech is filtered with multiple number of H/sub ∞/ filters. Finally, the estimation of clean speech is obtained from the sum of the weighted filtered outputs. Experimental results shows about 1dB∼2dB SNR improvement with a slight increment of computation compared with the Kalman filter method.

The relevancy between physical index and subjective appraisal of class (강의실내의 물리지표와 주관적평가와의 상관관계)

  • Lee, Chai-Bong;Kim, Yong-Man
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2002.11a
    • /
    • pp.374.1-374
    • /
    • 2002
  • The eventual purpose of this research is to make optimum standards for acoustic-environment by using not only physical characteristics but also subjective appraisals. First, basic Physical data were measured which were necessary to establish standards for acoustic environment in campus buildings, TSP has used to measure sound levels, reverberation times, clearness indexes, and speech-transmission-index. (omitted)

  • PDF

Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter (훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식)

  • Jung, Sung-Yun;Bae, Keun-Sung
    • MALSORI
    • /
    • no.53
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

An Experimental Study of Korean Dialectal Speech (한국어 방언 음성의 실험적 연구)

  • Kim, Hyun-Gi;Choi, Young-Sook;Kim, Deok-Su
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.49-65
    • /
    • 2006
  • Recently, several theories on the digital speech signal processing expanded the communication boundary between human beings and machines drastically. The aim of this study is to collect dialectal speech in Korea on a large scale and to establish a digital speech data base in order to provide the data base for further research on the Korean dialectal and the creation of value-added network. 528 informants across the country participated in this study. Acoustic characteristics of vowels and consonants are analyzed by Power spectrum and Spectrogram of CSL. Test words were made on the picture cards and letter cards which contained each vowel and each consonant in the initial position of words. Plot formants were depicted on a vowel chart and transitions of diphthongs were compared according to dialectal speech. Spectral times, VOT, VD, and TD were measured on a Spectrogram for stop consonants, and fricative frequency, intensity, and lateral formants (LF1, LF2, LF3) for fricative consonants. Nasal formants (NF1, NF2, NF3) were analyzed for different nasalities of nasal consonants. The acoustic characteristics of dialectal speech showed that young generation speakers did not show distinction between close-mid /e/ and open-mid$/\epsilon/$. The diphthongs /we/ and /wj/ showed simple vowels or diphthongs depending to dialect speech. The sibilant sound /s/ showed the aspiration preceded to fricative noise. Lateral /l/ realized variant /r/ in Kyungsang dialectal speech. The duration of nasal consonants in Chungchong dialectal speech were the longest among the dialects.

  • PDF

Speech Enhancement using RNN Phoneme based VAD (음소기반의 순환 신경망 음성 검출기를 이용한 음성 향상)

  • Lee, Kang;Kang, Sang-Ick;Kwon, Jang-woo;Lee, Samgmin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.5
    • /
    • pp.85-89
    • /
    • 2017
  • In this papers, we apply high performance hardware and machine learning algorithm to build an advanced VAD algorithm for speech enhancement. Since speech is made of series of phoneme, using recurrent neural network (RNN) which consider previous data is proper method to build a speech model. It is impossible to study every noise in real world. So our algorithm is builded by phoneme based study. we detect voice present frames in noisy speech signal and make enhancement of the speech signal. Phoneme based RNN model shows advanced performance in speech signal which has high correlation among each frames. To verify the performance of proposed algorithm, we compare VAD result with label data and speech enhancement result in various noise environments with previous speech enhancement algorithm.

Design of Speech Enhancement U-Net for Embedded Computing (임베디드 연산을 위한 잡음에서 음성추출 U-Net 설계)

  • Kim, Hyun-Don
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.5
    • /
    • pp.227-234
    • /
    • 2020
  • In this paper, we propose wav-U-Net to improve speech enhancement in heavy noisy environments, and it has implemented three principal techniques. First, as input data, we use 128 modified Mel-scale filter banks which can reduce computational burden instead of 512 frequency bins. Mel-scale aims to mimic the non-linear human ear perception of sound by being more discriminative at lower frequencies and less discriminative at higher frequencies. Therefore, Mel-scale is the suitable feature considering both performance and computing power because our proposed network focuses on speech signals. Second, we add a simple ResNet as pre-processing that helps our proposed network make estimated speech signals clear and suppress high-frequency noises. Finally, the proposed U-Net model shows significant performance regardless of the kinds of noise. Especially, despite using a single channel, we confirmed that it can well deal with non-stationary noises whose frequency properties are dynamically changed, and it is possible to estimate speech signals from noisy speech signals even in extremely noisy environments where noises are much lauder than speech (less than SNR 0dB). The performance on our proposed wav-U-Net was improved by about 200% on SDR and 460% on NSDR compared to the conventional Jansson's wav-U-Net. Also, it was confirmed that the processing time of out wav-U-Net with 128 modified Mel-scale filter banks was about 2.7 times faster than the common wav-U-Net with 512 frequency bins as input values.

Telephone Speech Recognition with Data-Driven Selective Temporal Filtering based on Principal Component Analysis

  • Jung Sun Gyun;Son Jong Mok;Bae Keun Sung
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.764-767
    • /
    • 2004
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis.

  • PDF

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.276-280
    • /
    • 2016
  • A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.