• Title/Summary/Keyword: Voice enhancement

Search Result 82, Processing Time 0.028 seconds

An Enhanced Clarity of Husky Voice by Dissonant Frequency Filtering

  • Kang, Sang-Ki;Baek, Seong-Joon
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.71-76
    • /
    • 2005
  • There have been numerous studies on the enhancement of noisy speech signal. In this paper, we propose a new speech enhancement method, that is, a filtering of a dissonant frequency combined with noise suppression algorithm. The simulation results indicate that the proposed method provides a significant gain in voice clarity. Therefore if the proposed enhancement scheme is used as a pre-filter, the perceptual clarity of husky voice is greatly enhanced.

  • PDF

Robust speech quality enhancement method against background noise and packet loss at voice-over-IP receiver (배경잡음 및 패킷손실에 강인한 voice-over-IP 수신단 기반 음질향상 기법)

  • Kim, Gee Yeun;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.512-517
    • /
    • 2018
  • Improving voice quality is a major concern in telecommunications. In this paper, we propose a robust speech quality enhancement against background noise and packet loss at VoIP (Voice-over-IP) receiver. The proposed method combines network jitter estimation based on hybrid Markov chain, adaptive playout scheduling using the estimated jitter, and speech enhancement based on restoration of amplitude and phase to enhance the quality of the speech signal arriving at the VoIP receiver over IP network. The experimental results show that the proposed method removes the background noise added to the speech signal before encoding at the sender side and provides the enhanced speech quality in an unstable network environment.

Speech Enhancement Based on Voice/Unvoice Classification (유성음/무성음 분리를 이용한 잡음처리)

  • 유창동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.374-379
    • /
    • 2002
  • In this paper, a nobel method to reduce noise using voice/unvoice classification is proposed. Voice and unvoice are an important feature of speech and the proposed method processes noisy speech differently for each voice/unvoice part. Speech is classified into voice/unvoice using zero-crossing rate and energy, and a modified speech/noise dominant-decision is proposed based on voice/unvoice classification. The proposed method was tested on conditions of white noise and airplane noise, and on the basis of comparing segmental SNR with the existing method and listening to the enhanced speech, a performance of the proposed method was superior to that of the existing method.

Speech Enhancement for Voice commander in Car environment (차량환경에서 음성명령어기 사용을 위한 음성개선방법)

  • 백승권;한민수;남승현;이봉호;함영권
    • Journal of Broadcast Engineering
    • /
    • v.9 no.1
    • /
    • pp.9-16
    • /
    • 2004
  • In this paper, we present a speech enhancement method as a pre-processor for voice commander under car environment. For the friendly and safe use of voice commander in a running car, non-stationary audio signals such as music and non-candidate speech should be reduced. Ow technique is a two microphone-based one. It consists of two parts Blind Source Separation (BSS) and Kalman filtering. Firstly, BSS is operated as a spatial filter to deal with non-stationary signals and then car noise is reduced by kalman filtering as a temporal filter. Algorithm Performance is tested for speech recognition. And the results show that our two microphone-based technique can be a good candidate to a voice commander.

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

Feedback Active Noise Control Based Voice Enhancing Ear-Protection System

  • Moon, Seong-Pil;Chang, Tae-Gyu
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.4
    • /
    • pp.1627-1633
    • /
    • 2017
  • This paper proposes a voice enhancing ear-protection system which is based on feedback active noise control(FBANC). The proposed system selectively suppresses the background noise and preserves the talking voice by controlling the adaptive algorithm with the voice activity period detection module. The noise reduction performance of the proposed noise canceling algorithm is analytically derived for the two key performance affecting parameters, i.e., electro-acoustic coupling distance and noise bandwidth. The proposed system is also implemented with a floating-point DSP system and its performance is experimentally tested to compare with the analytically derived results. The achieved levels of noise reduction for the three different noise bandwidths cases, i.e., 10Hz, 50Hz, and 90Hz, are high to show 17.05dB, 10.54dB and 8.99dB, respectively. The feasibility of the proposed system is also shown by the peak noise reduction achieved more than 25dB while preserving the voice component in the frequency range between 200-800Hz.

An Improved VAD Algorithm Employing Speech Enhancement Preprocessing and Threshold Updating (음성 향상 전처리와 문턱값 갱신을 적용한 향상된 음성검출 방법)

  • 이윤창;안상식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.11C
    • /
    • pp.1161-1168
    • /
    • 2003
  • In this paper, we propose an improved statistical model-based voice activity detection algorithm and threshold update method. We first improve signal-to-noise ratio by using speech enhancement preprocessing algorithm combined power subtraction method and matched filter, then apply it to LLR test optimum decision rule for improving the performance even in low SNR conditions. And we propose an adaptive threshold update method that was not concerned in any papers. We also perform extensive computer simulations to demonstrate the performance improvement of the proposed VAD algorithm employing the proposed speech enhancement preprocessing algorithm and adaptive threshold update method under various background noise environments. Finally we verify our results by comparing ITU-T G.729 Annex B.

Voice quality distinctions of the three-way stop contrast under prosodic strengthening in Korean

  • Jiyoung Jang;Sahyang Kim;Taehong Cho
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.17-24
    • /
    • 2024
  • The Korean three-way stop contrast (lenis, aspirated, fortis) is currently undergoing a sound change, such that the primary cue distinguishing lenis and aspirated stops is shifting from voice onset time (VOT) to F0. Despite recent discussions of this shift, research on voice quality, traditionally considered an additional cue signaling the contrast, remains sparse. This study investigated the extent to which the associated voice quality [as reflected in the acoustic measurements of H1*-H2*, H1*- A1*, and cepstral peak prominence (CPP)] contributes to the three-way stop contrast, and how the realization is conditioned by prominence- vs. boundary-induced prosodic strengthening amid the ongoing sound change. Results for 12 native Korean speakers indicate that there was a substantial distinction in voice quality among the three stop categories with the breathiness of the vowel being the greatest after the lenis, intermediate after the aspirated, and least after the fortis stops, indicating the role of voice quality in the maintenance of the three-way stop contrast. Furthermore, prosodic strengthening has different effects on the contrast and contributes to the enhancement of the phonological contrast contingent on whether it is induced by prominence or boundary.

Complex nested U-Net-based speech enhancement model using a dual-branch decoder (이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델)

  • Seorim Hwang;Sung Wook Park;Youngcheol Park
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.253-259
    • /
    • 2024
  • This paper proposes a new speech enhancement model based on a complex nested U-Net with a dual-branch decoder. The proposed model consists of a complex nested U-Net to simultaneously estimate the magnitude and phase components of the speech signal, and the decoder has a dual-branch decoder structure that performs spectral mapping and time-frequency masking in each branch. At this time, compared to the single-branch decoder structure, the dual-branch decoder structure allows noise to be effectively removed while minimizing the loss of speech information. The experiment was conducted on the VoiceBank + DEMAND database, commonly used for speech enhancement model training, and was evaluated through various objective evaluation metrics. As a result of the experiment, the complex nested U-Net-based speech enhancement model using a dual-branch decoder increased the Perceptual Evaluation of Speech Quality (PESQ) score by about 0.13 compared to the baseline, and showed a higher objective evaluation score than recently proposed speech enhancement models.

A study on Voice Recognition using Model Adaptation HMM for Mobile Environment (모델적응 HMM을 이용한 모바일환경에서의 음성인식에 관한 연구)

  • Ahn, Jong-Young;Kim, Sang-Bum;Kim, Su-Hoon;Hur, Kang-In
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.3
    • /
    • pp.175-179
    • /
    • 2011
  • In this paper, we propose the MA(Model Adaption) HMM that to use speech enhancement and feature compensation. Normally voice reference data is not consider for real noise data. This method is not to use estimated noise but we use real life environment noise data. And we applied this contaminated data for recognition reference model that suitable for noise environment. MAHMM is combined with surround noise when generating reference patten. We improved voice recognition rate at mobile environment to use MAHMM.