Search | Korea Science

A Probabilistic Combination Method of Minimum Statistics and Soft Decision for Robust Noise Power Estimation in Speech Enhancement (강인한 음성향상을 위한 Minimum Statistics와 Soft Decision의 확률적 결합의 새로운 잡음전력 추정기법)

Park, Yun-Sik;Chang, Joon-Hyuk
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.4
- /
- pp.153-158
- /
- 2007
This paper presents a new approach to noise estimation to improve speech enhancement in non-stationary noisy environments. The proposed method combines the two separate noise power estimates provided by the minimum statistics (MS) for speech presence and soft decision (SD) for speech absence in accordance with SAP (Speech Absence Probability) on a separate frequency bin. The performance of the proposed algorithm is evaluated by the subjective test under various noise environments and yields better results compared with the conventional MS or SD-based schemes.
https://doi.org/10.7776/ASK.2007.26.4.153 인용 PDF KSCI

A Study on the Improvements of the Speech Quality by using Distribution Characteristics of LSP parameters in the EVRC(Enhanced Variable Rate Codec) (LSP 파라미터의 분포특성을 이용한 EVRC의 음질개선에 관한 연구)

Min, So-Yeon;Na, Deok-Su
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.12 no.12
- /
- pp.5843-5848
- /
- 2011
To improve the efficiency of the channel spectrum and to reduce the power consumption of the system in EVRC, the voice signal is compressed and transmitted only when the user speaks to. In addition to this, voice frames are divided into three rates 1, 1/2 and 1/8 and each frame is handled differently. For example, we assumed that the input is silence region if the 1/8 rate is used. In this paper, the sections are firstly separated into the voiced speech signal region, unvoiced speech signal region, and silence region by using distribution characteristics of LSP parameters. Then the paper suggested to encode 1 rate for the voiced speech signal, 1/2 rate for the unvoiced speech signal region, 1/8 rate for the silence region. In other words, traditional way of transmission is used when sending full rate in the EVRC. However, when sending half rate, the voice is firstly distinguished between voiced and unvoiced. If the voice is distinguished as voiced, voice is converted into full rate before the transmission. If it is distinguished as silence, EVRC's basic rate is applied. In the experimental results with SNR, ASDM, transmission bit rate measurement, we have demonstrated that voice quality was improved by using the proposed algorithm.
https://doi.org/10.5762/KAIS.2011.12.12.5843 인용 PDF KSCI

Speech Reinforcement Based on G.729A Speech Codec Parameter Under Near-End Background Noise Environments (근단 배경 잡음 환경에서 G.729A 음성부호화기 파라미터에 기반한 새로운 음성 강화 기법)

Choi, Jae-Hun;Chang, Joon-Hyuk
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.4
- /
- pp.392-400
- /
- 2009
In this paper, we propose an effective speech reinforcement technique base on ITU-T G.729A CS-ACELP codec under the near-end background noise environments. In general, since the intelligibility of the far-end speech for the near-end listener is significantly reduced under near-end noise environments, we require a far-end speech reinforcement approach to avoid this phenomena. In contrast to the conventional speech reinforcement algorithm, we reinforce the excitation signal of the codec's parameters received from the far-end speech signal based on the G.729A speech codec under various background noise environments. Specifically, we first estimate the excitation signal of ambient noise at the near-end through the encoder of the G.729A speech codec, reinforcing the excitation signal of the far-end speech transmitted from the far-end. we specially propose a novel approach to directly reinforce the excitation signal of far-end speech signal based on the decoder of the G.729A. The performance of the proposed algorithm is evaluated by the CCR (Comparison Category Rating) test of the method for subjective determination of transmission quality in ITU-T P.800 under various noise environments and shows better performances compared with conventional SNR Recovery methods.
https://doi.org/10.7776/ASK.2009.28.4.392 인용 PDF KSCI

Objective Assessment Model for Refrigerator Noises (냉장고 소음의 객관적 평가 모델)

Park, Jong-Geun;Cho, Youn;Lee, Sang-Wook;Hwang, Dae-Sun;Lee, Chul-Hee
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.46 no.5
- /
- pp.80-90
- /
- 2009
This paper presents objective methods which predict perceptual noise levels caused by refrigerators. Eight home refrigerators are chosen and their noises are recorded in an anechoic-chamber and a real-life apartment. In order to obtain perceptual noise levels of the refrigerators, subjective quality assessment tests were performed by 100 evaluators Then, we compute 5 sound quality metrics (SQM) which reflect psychoacoustics characteristics. Finally, objective assessment model for refrigerator noises is developed by linear combination of SQMs.
PDF KSCI

Packet Loss Concealment Algorithm Based on Speech Characteristics (음성신호의 특성을 고려한 패킷 손실 은닉 알고리즘)

Yoon Sung-Wan;Kang Hong-Goo;Youn Dae-Hee
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.31 no.7C
- /
- pp.691-699
- /
- 2006
Despite of the in-depth effort to cantrol the variability in IP networks, quality of service (QoS) is still not guaranteed in the IP networks. Thus, it is necessary to deal with the audible artifacts caused by packet lasses. To overcame the packet loss problem, most speech coding standard have their own embedded packet loss concealment (PLC) algorithms which adapt extrapolation methods utilizing the dependency on adjacent frames. Since many low bit rate CELP coders use predictive schemes for increasing coding efficiency, however, error propagation occurs even if single packet is lost. In this paper, we propose an efficient PLC algorithm with consideration about the speech characteristics of lost frames. To design an efficient PLC algorithm, we perform several experiments on investigating the error propagation effect of lost frames of a predictive coder. And then, we summarize the impact of packet loss to the speech characteristics and analyze the importance of the encoded parameters depending on each speech classes. From the result of the experiments, we propose a new PLC algorithm that mainly focuses on reducing the error propagation time. Experimental results show that the performance is much higher than conventional extrapolation methods over various frame erasure rate (FER) conditions. Especially the difference is remarkable in high FER condition.
PDF KSCI

Robust Speech Enhancement Based on Soft Decision Employing Spectral Deviation (스펙트럼 변이를 이용한 Soft Decision 기반의 음성향상 기법)

Choi, Jae-Hun;Chang, Joon-Hyuk;Kim, Nam-Soo
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.47 no.5
- /
- pp.222-228
- /
- 2010
In this paper, we propose a new approach to noise estimation incorporating spectral deviation with soft decision scheme to enhance the intelligibility of the degraded speech signal in non-stationary noisy environments. Since the conventional noise estimation technique based on soft decision scheme estimates and updates the noise power spectrum using a fixed smoothing parameter which was assumed in stationary noisy environments, it is difficult to obtain the robust estimates of noise power spectrum in non-stationary noisy environments that spectral characteristics of noise signal such as restaurant constantly change. In this paper, once we first classify the stationary noise and non-stationary noise environments based on the analysis of spectral deviation of noise signal, we adaptively estimate and update the noise power spectrum according to the classified noise types. The performances of the proposed algorithm are evaluated by ITU-T P. 862 perceptual evaluation of speech quality (PESQ) under various ambient noise environments and show better performances compared with the conventional method.
PDF KSCI

Design of a Low Bit-rate Speech Coder Based on Mixed Multi-band Excitation Model (혼합 다중대역 여기모델에 기반한 저 전송률 음성 부호화기의 설계)

한우진;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.6
- /
- pp.510-521
- /
- 2002
MBE (multi-band excitation) coder can achieve high qualify synthetic speech below 4.0 kbps. There are, however, significant differences of the fine structure between the original spectrum and the synthetic spectrum. They are mainly due to the exclusive partition of voiced and unvoiced regions in frequency domain and the decision procedure based on the experimental threshold. This paper proposes MMBE (mixed multi-band excitation) speech model to overcome drawbacks of a MBE coder. In addition, two analysis methods, which do not need my decision procedure based on a threshold, are presented. Both voiced and unvoiced components can be mixed over all the frequency axis in the MMBE speech model. To illustrate the potential of the proposed speech model, we develop a 2.6 kbps MMBE coder and compare it with a 2.9 kbps MBE coder by both objective and subjective methods. The results have shown that the proposed coder has a better performance even at a lower bit-rate compared with the MBE coder.
PDF KSCI

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

Kim, Sang-hun; Park, Jun;Lee, Young-jik
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.24-33
- /
- 2001
this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.
PDF

A New Wideband Speech/Audio Coder Interoperable with ITU-T G.729/G.729E (ITU-T G.729/G.729E와 호환성을 갖는 광대역 음성/오디오 부호화기)

Kim, Kyung-Tae;Lee, Min-Ki;Youn, Dae-Hee
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.45 no.2
- /
- pp.81-89
- /
- 2008
Wideband speech, characterized by a bandwidth of about 7 kHz (50-7000 Hz), provides a substantial quality improvement in terms of naturalness and intelligibility. Although higher data rates are required, it has extended its application to audio and video conferencing, high-quality multimedia communications in mobile links or packet-switched transmissions, and digital AM broadcasting. In this paper, we present a new bandwidth-scalable coder for wideband speech and audio signals. The proposed coder spits 8kHz signal bandwidth into two narrow bands, and different coding schemes are applied to each band. The lower-band signal is coded using the ITU-T G.729/G.729E coder, and the higher-band signal is compressed using a new algorithm based on the gammatone filter bank with an invertible auditory model. Due to the split-band architecture and completely independent coding schemes for each band, the output speech of the decoder can be selected to be a narrowband or wideband according to the channel condition. Subjective tests showed that, for wideband speech and audio signals, the proposed coder at 14.2/18 kbit/s produces superior quality to ITU-T 24 kbit/s G.722.1 with the shorter algorithmic delay.
PDF KSCI

BS-PLC(Both Side-Packet Loss Concealment) for CELP Coder (CELP 부호화기를 위한 양방향 패킷 손실 은닉 알고리즘)

Lee In-Sung;Hwang Jeong-Joon;Jeong Gyu-Hyeok
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.42 no.12
- /
- pp.127-134
- /
- 2005
Lost packet robustness is an most important quality measure for voice over IP networks(VoIP). Recovery of the lost packet from the received information is crucial to realize this robustness. So, this paper proposes the lost packet recovery method from the received information for real-time communication for CELP coder. The proposed BS-PLC (Both Side Packet Loss Concealment) based WSOLA(Waveform Shift OverLab Add) allow the lost packet to be recovered from both the 'previous' and 'next' good packet as the LP parameter and the excitation signal are respectively recovered. The burst of packet loss is modeled by Gilbert model. The proposed scheme is applied to G.729 most used in VoIP and is evaluated through the SNR(signal to noise) and the MOS(Mean Opinion Score) test. As a simulation result, The proposed scheme provide 0.3 higher in Mean Opinion Score and 2 dB higher in terms of SNR than an error concealment procedure in the decoder of G.729 at $20\%$ average packet loss rate.
PDF KSCI

Search Result 70, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)