Search | Korea Science

Design of Speech Enhancement U-Net for Embedded Computing (임베디드 연산을 위한 잡음에서 음성추출 U-Net 설계)

Kim, Hyun-Don
- IEMEK Journal of Embedded Systems and Applications
- /
- v.15 no.5
- /
- pp.227-234
- /
- 2020
In this paper, we propose wav-U-Net to improve speech enhancement in heavy noisy environments, and it has implemented three principal techniques. First, as input data, we use 128 modified Mel-scale filter banks which can reduce computational burden instead of 512 frequency bins. Mel-scale aims to mimic the non-linear human ear perception of sound by being more discriminative at lower frequencies and less discriminative at higher frequencies. Therefore, Mel-scale is the suitable feature considering both performance and computing power because our proposed network focuses on speech signals. Second, we add a simple ResNet as pre-processing that helps our proposed network make estimated speech signals clear and suppress high-frequency noises. Finally, the proposed U-Net model shows significant performance regardless of the kinds of noise. Especially, despite using a single channel, we confirmed that it can well deal with non-stationary noises whose frequency properties are dynamically changed, and it is possible to estimate speech signals from noisy speech signals even in extremely noisy environments where noises are much lauder than speech (less than SNR 0dB). The performance on our proposed wav-U-Net was improved by about 200% on SDR and 460% on NSDR compared to the conventional Jansson's wav-U-Net. Also, it was confirmed that the processing time of out wav-U-Net with 128 modified Mel-scale filter banks was about 2.7 times faster than the common wav-U-Net with 512 frequency bins as input values.
https://doi.org/10.14372/IEMEK.2020.15.5.227 인용 PDF KSCI

Implementation of Environmental Noise Remover for Speech Signals (배경 잡음을 제거하는 음성 신호 잡음 제거기의 구현)

Kim, Seon-Il;Yang, Seong-Ryong
- 전자공학회논문지 IE
- /
- v.49 no.2
- /
- pp.24-29
- /
- 2012
The sounds of exhaust emissions of automobiles are independent sound sources which are nothing to do with voices. We have no information for the sources of voices and exhaust sounds. Accordingly, Independent Component Analysis which is one of the Blind Source Separaton methods was used to segregate two source signals from each mixed signals. Maximum Likelyhood Estimation was applied to the signals came through the stereo microphone to segregate the two source signals toward the maximization of independence. Since there is no clue to find whether it is speech signal or not, the coefficients of the slope was calculated by the autocovariances of the signals in frequcency domain. Noise remover for speech signals was implemented by coupling the two algorithms.
PDF KSCI

Speech Enhancement Using Multiple Kalman Filter (다중칼만필터를 이용한 음성향상)

이기용
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.225-230
- /
- 1998
In this paper, a Kalman filter approach for enhancing speech signals degraded by statistically independent additive nonstationary noise is developed. The autoregressive hidden markov model is used for modeling the statistical characteristics of both the clean speech signal and the nonstationary noise process. In this case, the speech enhancement comprises a weighted sum of conditional mean estimators for the composite states of the models for the speech and noise, where the weights equal to the posterior probabilities of the composite states, given the noisy speech. The conditional mean estimators use a smoothing spproach based on two Kalmean filters with Markovian switching coefficients, where one of the filters propagates in the forward-time direction with one frame. The proposed method is tested against the noisy speech signals degraded by Gaussian colored noise or nonstationary noise at various input signal-to-noise ratios. An app개ximate improvement of 4.7-5.2 dB is SNR is achieved at input SNR 10 and 15 dB. Also, in a comparison of conventional and the proposed methods, an improvement of the about 0.3 dB in SNR is obtained with our proposed method.
PDF

Two-step a priori SNR Estimation in the Log-mel Domain Considering Phase Information (위상 정보를 고려한 로그멜 영역에서의 2단계 선험 SNR 추정)

Lee, Yun-Kyung;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.3 no.1
- /
- pp.87-94
- /
- 2011
The decision directed (DD) approach is widely used to determine a priori SNR from noisy speech signals. In conventional speech enhancement systems with a DD approach, a priori SNR is estimated by using only the magnitude components and consequently follows a posteriori SNR with one frame delay. We propose a phase-dependent two-step a priori SNR estimator based on the minimum mean square error (MMSE) in the log-mel spectral domain so that we can consider both magnitude and phase information, and it can overcome the performance degradation caused by one frame delay. From the experimental results, the proposed estimator is shown to improve the output SNR of enhanced speech signals by 2.3 dB compared to the conventional DD approach-based system.
PDF

An Introduction to Energy-Based Blind Separating Algorithm for Speech Signals

Mahdikhani, Mahdi;Kahaei, Mohammad Hossein
- ETRI Journal
- /
- v.36 no.1
- /
- pp.175-178
- /
- 2014
We introduce the Energy-Based Blind Separating (EBS) algorithm for extremely fast separation of mixed speech signals without loss of quality, which is performed in two stages: iterative-form separation and closed-form separation. This algorithm significantly improves the separation speed simply due to incorporating only some specific frequency bins into computations. Simulation results show that, on average, the proposed algorithm is 43 times faster than the independent component analysis (ICA) for speech signals, while preserving the separation quality. Also, it outperforms the fast independent component analysis (FastICA), the joint approximate diagonalization of eigenmatrices (JADE), and the second-order blind identification (SOBI) algorithm in terms of separation quality.
https://doi.org/10.4218/etrij.14.0213.0130 인용 PDF KSCI

Performance improvement of adaptivenoise canceller with the colored noise (유색잡음에 대한 적응잡음제거기의 성능향성)

박장식;조성환;손경식
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.22 no.10
- /
- pp.2339-2347
- /
- 1997
The performance of the adaptive noise canceller using LMS algorithm is degraded by the gradient noise due to target speech signals. An adaptive noise canceller with speech detector was proposed to reduce this performande degradation. The speech detector utilized the adaptive prediction-error filter adapted by the NLMS algorithm. This paper discusses to enhance the performance of the adaptive noise canceller forthecorlored noise. The affine projection algorithm, which is known as faster than NLMS algorithm for correlated signals, is used to adapt the adaptive filter and the adaptive prediction error filter. When the voice signals are detected by the speech detector, coefficients of adaptive filter are adapted by the sign-error afine projection algorithm which is modified to reduce the miaslignment of adaptive filter coefficients. Otherwirse, they are adapted by affine projection algorithm. To obtain better performance, the proper step size of sign-error affine projection algorithm is discussed. As resutls of computer simulation, it is shown that the performance of the proposed ANC is better than that of conventional one.
PDF

The Utility of Perturbation, Non-linear dynamic, and Cepstrum measures of dysphonia according to Signal Typing (음성 신호 분류에 따른 장애 음성의 변동률 분석, 비선형 동적 분석, 캡스트럼 분석의 유용성)

Choi, Seong Hee;Choi, Chul-Hee
- Phonetics and Speech Sciences
- /
- v.6 no.3
- /
- pp.63-72
- /
- 2014
The current study assessed the utility of acoustic analyses the most commonly used in routine clinical voice assessment including perturbation, nonlinear dynamic analysis, and Spectral/Cepstrum analysis based on signal typing of dysphonic voices and investigated their applicability of clinical acoustic analysis methods. A total of 70 dysphonic voice samples were classified with signal typing using narrowband spectrogram. Traditional parameters of %jitter, %shimmer, and signal-to-noise ratio were calculated for the signals using TF32 and correlation dimension(D2) of nonlinear dynamic parameter and spectral/cepstral measures including mean CPP, CPP_sd, CPPf0, CPPf0_sd, L/H ratio, and L/H ratio_sd were also calculated with ADSV(Analysis of Dysphonia in Speech and VoiceTM). Auditory perceptual analysis was performed by two blinded speech-language pathologists with GRBAS. The results showed that nearly periodic Type 1 signals were all functional dysphonia and Type 4 signals were comprised of neurogenic and organic voice disorders. Only Type 1 voice signals were reliable for perturbation analysis in this study. Significant signal typing-related differences were found in all acoustic and auditory-perceptual measures. SNR, CPP, L/H ratio values for Type 4 were significantly lower than those of other voice signals and significant higher %jitter, %shimmer were observed in Type 4 voice signals(p<.001). Additionally, with increase of signal type, D2 values significantly increased and more complex and nonlinear patterns were represented. Nevertheless, voice signals with highly noise component associated with breathiness were not able to obtain D2. In particular, CPP, was highly sensitive with voice quality 'G', 'R', 'B' than any other acoustic measures. Thus, Spectral and cepstral analyses may be applied for more severe dysphonic voices such as Type 4 signals and CPP can be more accurate and predictive acoustic marker in measuring voice quality and severity in dysphonia.
https://doi.org/10.13064/KSSS.2014.6.3.063 인용 PDF KSCI

Analysis of Eigenvalues of Covariance Matrices of Speech Signals in Frequency Domain for Various Bands (음성 신호의 주파수 영역에서의 주파수 대역별 공분산 행렬의 고유값 분석)

Kim, Seonil
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2016.05a
- /
- pp.293-296
- /
- 2016
Speech Signals consist of signals of consonants and vowels, but the lasting time of vowels is much longer than that of consonants. It can be assumed that the correlations between signal blocks in speech signal is very high. But the correlations between signal blocks in various frequency bands can be quite different. Each speech signal is divided into blocks which have 128 speech data. FFT is applied to each block. Various frequency areas of the results of FFT are taken and Covariance matrix between blocks in a speech signal is extracted and finally eigenvalues of those matrix are obtained. It is studied that in the eigenvalues of various frequency bands which band can be used to get more reliable result.
PDF

Enhanced Adjustment Strategy of Masking Threshold for Speech Signals in Low Bit-Rate Audio Coding (저전송률 오디오 부호화에서 음성 신호의 성능 개선을 위한 마스킹 임계값 적응기법 향상)

Lee, Chang-Heon;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.1
- /
- pp.62-68
- /
- 2010
This paper proposes a new masking threshold adjustment strategy to improve the performance for speech signals in low bit-rate audio coding. After determining formant regions, the masking threshold is adjusted by using the energy ratio of each sub-band to the average energy of each formant. More quantization noises are added to the bands that have relatively large energy, but less distortion is allowed in spectral valley regions by allocating more bits, which reflects the concept of perceptual weighting widely used in speech coding. From the results of objective speech quality measure, we verified that the proposed method improves quality for the speech input signals compared to the conventional one.
https://doi.org/10.7776/ASK.2010.29.1.062 인용 PDF KSCI

Frequency Bin Alignment Using Covariance of Power Ratio of Separated Signals in Multi-channel FD-ICA (다채널 주파수영역 독립성분분석에서 분리된 신호 전력비의 공분산을 이용한 주파수 빈 정렬)

Quan, Xingri;Bae, Keunsung
- Phonetics and Speech Sciences
- /
- v.6 no.3
- /
- pp.149-153
- /
- 2014
In frequency domain ICA, the frequency bin permutation problem falls off the quality of separated signals. In this paper, we propose a new algorithm to solve the frequency bin permutation problem using the covariance of power ratio of separated signals in multi-channel FD-ICA. It makes use of the continuity of the spectrum of speech signals to check if frequency bin permutation occurs in the separated signal using the power ratio of adjacent frequency bins. Experimental results have shown that the proposed method could fix the frequency bin permutation problem in the multi-channel FD-ICA.
https://doi.org/10.13064/KSSS.2014.6.3.149 인용 PDF KSCI

Search Result 499, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)