• Title/Summary/Keyword: Speech quality measure

Search Result 55, Processing Time 0.022 seconds

Speech Enhancement Based on IMCRA Incorporating noise classification algorithm (잡음 환경 분류 알고리즘을 이용한 IMCRA 기반의 음성 향상 기법)

  • Song, Ji-Hyun;Park, Gyu-Seok;An, Hong-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.12
    • /
    • pp.1920-1925
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA) in non-stationary noisy environment. The conventional IMCRA algorithm efficiently estimate the noise power by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. Since the minimum of smoothing parameter is defined as 0.85, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. For this reason, we proposed the modified IMCRA, which adaptively estimate and updata the noise power according to the noise type classified by the Gaussian mixture model (GMM). The performances of the proposed method are evaluated by perceptual evaluation of speech quality (PESQ) and composite measure under various environments and better results compared with the conventional method are obtained.

AMR-WB Algebraic Codebook Search Method Using the Re-examination of Pulses Position (펄스위치 재검색 방법을 이용한 AMR-WB 여기 코드북 검색)

  • Hur, Seok;Lee, In-Sung;Jee, Deock-Gu;Yoon, Byung-Sik;Choi, Song-In
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.4
    • /
    • pp.292-302
    • /
    • 2003
  • We propose a new method to reduce the complexity of excitation codebook search. The preselected excitation pulses by the coarse search method can be updated to pulses with higher quality performance measure. The excitation pulses can arbitrarily be deleted and inserted among the searched pulses until the overall performance achieves. If we use this excitation pulse search method in AMR-WB, the complexity required for excitation codebook search can be reduced to half the original method while the output speech maintains equal speech quality to a conventional method.

A Comparison of the Voice Differences of Patients with Idiopathic Parkinson's Disease and a Normal-Aging Group (파킨슨병 환자와 정상 노인의 음성비교)

  • Kang, Young-Ae;Kim, Yong-Duk;Ban, Jae-Chun;Seong, Cheol-Jae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.99-107
    • /
    • 2009
  • In view of the hypothesis that the effects of Parkinson disease on voice production can be detected before pharmacological intervention, the voice differences of patients with Idiopathic Parkinson's disease and a healthy aging group were diagnostically analyzed with the long term object of establishing, for clinical purposes, early disease-progression biomarkers. Fifteen patients with Idopathic Parkinson's disease (prior to pharmacological intervention) and a healthy control group of 15 were selected and every voice was recorded three times using praat (ver. 5022) with a headset mic. Relevant parameters - acoustic measure of /a/ phonation, F0 related parameters, MPT related parameters, articulatory ratio, VOT - were then analyzed by MANOVA. Significant differences were found in the F0 related (low F0, high F0, F0 range) and MPT related parameters. There were also significant differences in acoustic measurements (intensity, shimmer, HNR, jitter), AMR (/$t{\Lambda}$/,/$k{\Lambda}$/) and VOT (/ta/), The findings indicated that the voice production of patients with Idiopathic Parkinson's disease have normal pitch but bad quality. In particular, with slow articulatory ratios and VOT values, the tongue tip functioning of patients was lower than for the healthy group.

  • PDF

Speech enhancement based on reinforcement learning (강화학습 기반의 음성향상기법)

  • Park, Tae-Jun;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.335-337
    • /
    • 2018
  • 음성향상기법은 음성에 포함된 잡음이나 잔향을 제거하는 기술로써 마이크로폰으로 입력된 음성신호는 잡음이나 잔향에 의해 왜곡되어지므로 음성인식, 음성통신 등의 음성신호처리 기술의 핵심 기술이다. 이전에는 음성신호와 잡음신호 사이의 통계적 정보를 이용하는 통계모델 기반의 음성향상기법이 주로 사용되었으나 통계 모델 기반의 음성향상기술은 정상 잡음 환경과는 달리 비정상 잡음 환경에서 성능이 크게 저하되는 문제점을 가지고 있었다. 최근 머신러닝 기법인 심화신경망 (DNN, deep neural network)이 도입되어 음성 향상 기법에서 우수한 성능을 내고 있다. 심화신경망을 이용한 음성 향상 기법은 다수의 은닉 층과 은닉 노드들을 통하여 잡음이 존재하는 음성 신호와 잡음이 존재하지 않는 깨끗한 음성 신호 사이의 비선형적인 관계를 잘 모델링하였다. 이러한 심화신경망 기반의 음성향상기법을 향상 시킬 수 있는 방법 중 하나인 강화학습을 적용하여 기존 심화신경망 대비 성능을 향상시켰다. 강화학습이란 대표적으로 구글의 알파고에 적용된 기술로써 특정 state에서 최고의 reward를 받기 위해 어떠한 policy를 통한 action을 취해서 다음 state로 나아갈지를 매우 많은 경우에 대해 학습을 통해 최적의 action을 선택할 수 있도록 학습하는 방법을 말한다. 본 논문에서는 composite measure를 기반으로 reward를 설계하여 기존 PESQ (Perceptual Evaluation of Speech Quality) 기반의 reward를 설계한 기술 대비 음성인식 성능을 높였다.

Performance Comparison of AMR Codec Mode Allocations in Downlink WCDMA System (순방향 WCDMA 채널에서 AMR 음성 코덱 모드 할당방식에 대한 성능 비교)

  • Jeong, S.H.;Hong, J.W.;Lee, S.C.;Lie, C.H.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.31 no.4
    • /
    • pp.349-357
    • /
    • 2005
  • The Adaptive Multi-Rate (AMR) speech codec is the mandatory for voice service in WCDMA systems. The AMR codec can be used efficiently to provide a balanced trade-off between the capacity and quality of voice by adjusting various service rates. In this paper, three ways of AMR mode allocation schemes on the downlink in WCDMA system are evaluated. To evaluate users satisfaction efficiently, new system performance measure and analytic models are proposed. The proposed analytic models can be applied to obtain optimal mode allocation ways while considering the system capacity and quality of voice. In numerical examples, the ways of finding optimal parameters are illustrated for the given traffic loads and the performances of three mode allocation schemes are compared.

Quantitative Evaluation of the Performance of Monaural FDSI Beamforming Algorithm using a KEMAR Mannequin (KEMAR 마네킹을 이용한 단이 보청기용 FDSI 빔포밍 알고리즘의 정량적 평가)

  • Cho, Kyeongwon;Nam, Kyoung Won;Han, Jonghee;Lee, Sangmin;Kim, Dongwook;Hong, Sung Hwa;Jang, Dong Pyo;Kim, In Young
    • Journal of Biomedical Engineering Research
    • /
    • v.34 no.1
    • /
    • pp.24-33
    • /
    • 2013
  • To enhance the speech perception of hearing aid users in noisy environment, most hearing aid devices adopt various beamforming algorithms such as the first-order differential microphone (DM1) and the two-stage directional microphone (DM2) algorithms that maintain sounds from the direction of the interlocutor and reduce the ambient sounds from the other directions. However, these conventional algorithms represent poor directionality ability in low frequency area. Therefore, to enhance the speech perception of hearing aid uses in low frequency range, our group had suggested a fractional delay subtraction and integration (FDSI) algorithm and estimated its theoretical performance using computer simulation in previous article. In this study, we performed a KEMAR test in non-reverberant room that compares the performance of DM1, DM2, broadband beamforming (BBF), and proposed FDSI algorithms using several objective indices such as a signal-to-noise ratio (SNR) improvement, a segmental SNR (seg-SNR) improvement, a perceptual evaluation of speech quality (PESQ), and an Itakura-Saito measure (IS). Experimental results showed that the performance of the FDSI algorithm was -3.26-7.16 dB in SNR improvement, -1.94-5.41 dB in segSNR improvement, 1.49-2.79 in PESQ, and 0.79-3.59 in IS, which demonstrated that the FDSI algorithm showed the highest improvement of SNR and segSNR, and the lowest IS. We believe that the proposed FDSI algorithm has a potential as a beamformer for digital hearing aid devices.

A Study on the Robustness of a 16Kbps SBC over the Rayleigh fading Channel Error (16Kbps SBC의 Rayleigh 페이딩 채널에러에 대한 강인성 연구)

  • 오수환;이상욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.4
    • /
    • pp.287-295
    • /
    • 1986
  • In this paper, a SBC(sub-band-coding) is proposed to code a speech signal for a digital mobile radio and a robustness of speech quality of the SBC over the Rayleigh fading channel is investigated via a computer simulation. First the Rayleigh fading channel and 16-ary DPSK receiver models are presentes and verified its validitties by comparing with theoretical values. Three different measures: SNR, LPC distance measure and subjective listening test, were used to evaluate the effects due to the Rayleigh fading channel errors. From the results of computer simulation at BER=$10_{-3}$, $10_{-2}$, 5$ imes$$10_{-2}$, it was found that the speech remained quite intelligible at BER=$10_{-2}$and the link is still usuable even at BER=5$ imes$$10_{-2}$ Thus it was concluded that the SBC can be applicable to the digital mobile radio on the Rayleigh fading channel error in the range of $10_{-4}$~$10_{-2}$ without emplowing any error correction codes.

  • PDF

A Fast Pitch Searching Algorithm Using Correlation Characteristics in CELP Vocoder (상관관계 특성을 용한 CELP 보코더의 고속 피치검색 알고리듬)

  • Lee, Joo-Hun;Bae, Myung-Jin;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.20-25
    • /
    • 1994
  • The major drawback to the Code Excited Linear Prediction(CELP) type vocoders is their large computational requirements. In this paper, a simple method is proposed to reduce the pitch searching time in the pitch filter almost without degradation of quality. Bease upon the observational regularity of the correlation function of speech, the searching range can be restricted to the positive side in pitch search. This is done by skipping the negative side with the width which is estimated from the previous positive envelope. In addition to that, the maximum number of available lags can be limited by the threshold, $L_T$, which is set on 58 empirically. So, only the limited numbers of lags are considered in pitch search, which is less than a half of that of the full search method. By using the proposed method in pitch search, its required computations are greatly reduced. Experimental result shows 51% time reduction almost without lowering the speech quality in segmental SNR measure.

  • PDF

Speech Quality Measure in a Mobile Communication System using PLP Cepstral Distance with CMS (심리 음향 겝스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가)

  • 윤종진;박상욱;박영철;안동순;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.12B
    • /
    • pp.2046-2051
    • /
    • 2000
  • 본 논문에서는 기존의 음질 평가 방법들보다 우수할 뿐 아니라 다양한 채널 경로의 음성 신호에 대해서도 일관된 성능을 갖는 새로운 음질 평가 방법 PLP-CMS(Perceptual Linear Predictive-Cepstral Mean Subtraction)를 제안한다. CDMA PCS 이동 전화 환경에서 음성 신호의 주관적 음질을 효과적으로 예측할 수 있는 PLP-CMS는 심리 음향 선형 예측 분석(PLP Analysis: Perceptual Linear Predictive Analysis)을 이용하여 주관적 음질과의 상관 관계를 높였으며, 겝스트럼 평균 차감(CMS: Cepstral Mean Subtraction) 과정을 통하여 PSTN 경로에 무관하게 일관된 성능을 갖음을 확인하였다.

  • PDF

The Effect of An Increase of Closed Quotient on Improvement of Voice Quality after Type I Thyroplasty in Patients with Unilateral Vocal Cord Paralysis (일측 성대마비 환자에서 성대내전술 후 성대접촉율의 증가가 음질 개선에 미치는 영향)

  • Kim, Han-Su;Choi, Seung-Hee;Lim, Jae-Yol;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.1
    • /
    • pp.16-20
    • /
    • 2004
  • Purpose : To assess perceptual, acoustic and aerodynamic measure of voice quality in patients with unilateral vocal cord paralysis before and after type I thyroplasty. Methods : The clinical records of patients operated type I thyroplasty in the Departement of otorhinoalryngolgy, Yongdong Severance hospital from November 2001 to November 2003 were reviewed. All patients uderwent a vocal function evaluation including perceptual, acoustic and aerodynamic measures of voice preoperative and on $60^{th}$ postoperative day. The perceptual and acoustic measures were obtained from recording of patients' reading a 'Sanchak' passage. The perceptual evaluation was performed by 2 speech pathologist using a 4-point rating scale. Acoustic parameters(voice range profile low(RAL), voice range profile high(RAH), average fundamental frequency(AFX), closed quotient, harmonic to noise ratio, jitter and shimmer) were investigated by Lx speech studio. Mean flow rate(MFR), subglottic pressure(Psub) and intensity were measured using the Phonatory function analyzer. The maximum phonation time was also measured. The data were statistically analyzed. A paired t-test (p<0.1) was used to compare preoperative and postoperative results. And multiple regression test was used to find which parameter was most correlated to improvement of postoperative voice quality. Results : Among aerodynamic parameters, Psub $(88.11mmH_2O{\rightarrow}58.7mmH_2O)$, MPT(7.87sec${\rightarrow}$12.53sec), MFR (359.8ml/sec${\rightarrow}$161.06ml/sec) were statistically improved. AFx(205.5Hz${\rightarrow}$163.27Hz), AQx(23.9%${\rightarrow}$48.3%), RAL, RAH. Jotter and shimmer were improved. In multiple regression test, AFx and AQx was noted as the two meost correlated parameters to improvement of postoperative breathiness. But general grade of voice quality was more correlated to Psub and shimmer. Conclusion : Vocal fold medialization procedures effectively reduce glottic gap. Increasing of contact area of both vocal folds induced improvement in aerodynamic parameters and leaded stabilizing of vocal fold vibration. That effect results in improvement in acoustic parameters (shimmer, jitter, signal-to-noise ratio, voice range profile) and voice quality.

  • PDF