• Title/Summary/Keyword: speech quality evaluation

Search Result 178, Processing Time 0.029 seconds

Dialog System based on Speech Recognition for the Elderly with Dementia (음성인식에 기초한 치매환자 노인을 위한 대화시스템)

  • Kim, Sung-Il;Kim, Byoung-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.6
    • /
    • pp.923-930
    • /
    • 2002
  • This study aims at developing dialog system to improve the quality of life of the elderly with a dementia. The proposed system mainly consists of three modules including speech recognition, automatic search of the time-sorted dialog database, and agreeable responses with the recorded voices of caregivers. For the first step, the dialog that dementia patients often utter at a nursing home is first investigated. Next, the system is organized to recognize the utterances in order to meet their requests or demands. The system is then responded with recorded voices of professional caregivers. For evaluation of the system, the comparison study was carried out when the system was introduced or not, respectively. The occupational therapists then evaluated a male subjects reaction to the system by photographing his behaviors. The evaluation results showed that the dialog system was more responsive in catering to the needs of dementia patient than professional caregivers. Moreover, the proposed system led the patient to talk more than caregivers did in mutual communication.

Quantitative Evaluation of the Performance of Monaural FDSI Beamforming Algorithm using a KEMAR Mannequin (KEMAR 마네킹을 이용한 단이 보청기용 FDSI 빔포밍 알고리즘의 정량적 평가)

  • Cho, Kyeongwon;Nam, Kyoung Won;Han, Jonghee;Lee, Sangmin;Kim, Dongwook;Hong, Sung Hwa;Jang, Dong Pyo;Kim, In Young
    • Journal of Biomedical Engineering Research
    • /
    • v.34 no.1
    • /
    • pp.24-33
    • /
    • 2013
  • To enhance the speech perception of hearing aid users in noisy environment, most hearing aid devices adopt various beamforming algorithms such as the first-order differential microphone (DM1) and the two-stage directional microphone (DM2) algorithms that maintain sounds from the direction of the interlocutor and reduce the ambient sounds from the other directions. However, these conventional algorithms represent poor directionality ability in low frequency area. Therefore, to enhance the speech perception of hearing aid uses in low frequency range, our group had suggested a fractional delay subtraction and integration (FDSI) algorithm and estimated its theoretical performance using computer simulation in previous article. In this study, we performed a KEMAR test in non-reverberant room that compares the performance of DM1, DM2, broadband beamforming (BBF), and proposed FDSI algorithms using several objective indices such as a signal-to-noise ratio (SNR) improvement, a segmental SNR (seg-SNR) improvement, a perceptual evaluation of speech quality (PESQ), and an Itakura-Saito measure (IS). Experimental results showed that the performance of the FDSI algorithm was -3.26-7.16 dB in SNR improvement, -1.94-5.41 dB in segSNR improvement, 1.49-2.79 in PESQ, and 0.79-3.59 in IS, which demonstrated that the FDSI algorithm showed the highest improvement of SNR and segSNR, and the lowest IS. We believe that the proposed FDSI algorithm has a potential as a beamformer for digital hearing aid devices.

Voice Activity Detection Using Modified Power Spectral Deviation Based on Teager Energy (Teager Energy 기반의 수정된 파워 스펙트럼 편차를 이용한 음성 검출)

  • Song, J.H.;Song, Y.R.;Shim, H.M.;Lee, S.M.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.8 no.1
    • /
    • pp.41-46
    • /
    • 2014
  • In this paper, we propose a novel voice activity detection (VAD) algorithm using feature vectors based on TE (teager energy). Specifically, power spectral deviation (PSD), which is used as the feature for the VAD in the IS-127 noise suppression algorithm, is obtained after the input signal is transfomed by Teager energy operator. In addition, the TE-based likelihhod ratio are derived in each frame to modifiy the PSD for further VAD. The performance of our proposed VAD algorithm are evaluated by objective testing (total error rate, receiver operating characteristics, perceptual evaluation of speech quality) under various environments, and it is found that the proposed method yields better results than conventional VAD algorithms in the non-stationary noise environments under 5 dB SNR (total error rate = 2.6% decrease, PESQ score = 0.053 improvement).

  • PDF

Efficacy of laughing voice treatment (SKMVTT) in benign vocal fold lesions (양성성대질환의 웃음 음성치료(SKMVTT))

  • Jung, Dae-Yong;Wi, Joon-Yeol;Kim, Seong-Tae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.155-161
    • /
    • 2018
  • The purpose of this study was to evaluate the efficacy of a multiple voice therapy technique ($SKMVTT^{(R)}$) using laughter for the treatment of various benign vocal fold lesions. To achieve this, 23 female patients diagnosed with vocal nodules, vocal polyp, and muscle tension dysphonia through videostroboscopy were enrolled in vocal hygiene and $SKMVTT^{(R)}$. All of the patients were treated once a week for 4 to 12 sessions. The GRBAS scale was used to confirm the changes in voice quality before and after the treatment. Acoustic analysis was performed to evaluate jitter, shimmer, NHR, fundamental frequency variation, amplitude variation, PFR, and dB range. Videostroboscopy was performed to confirm the changes in the laryngeal features before and after the treatment. After the $SKMVTT^{(R)}$, the results of the perceptual evaluation demonstrated that the G, R, and B scales significantly improved. An acoustic evaluation also demonstrated that jitter, shimmer, NHR, vAm, vFo, PFR, and dB range also significantly improved after the $SKMVTT^{(R)}$. In comparison to the videostroboscopic findings, the size of the vocal nodules and vocal polyp decreased or disappeared after the treatment. In addition, the size of the cuneiform tubercles decreased, the length of the aryepiglottic folds became longer, and the laryngeal findings of the supraglottic compressions improved after the $SKMVTT^{(R)}$. These results suggest that the $SKMVTT^{(R)}$ is effective in improving the vocal quality of patients with benign vocal fold lesions. In conclusion, it seems that laughter and inspiratory phonation suppressed abnormal laryngeal elevation and lowered laryngeal height, which seems to have the effect of improving hyperfunctional phonation.

A Systematic Review on Voice Characteristics and Risk Factors of Voice Disorder of Korea Teachers (우리나라 교사의 음성 특성과 음성장애 위험 요인에 관한 체계적 문헌고찰)

  • Cha, Seulki;Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.8
    • /
    • pp.149-154
    • /
    • 2018
  • As the range of professional voice users are expanding, interest towards voice increases as well. Especially as teachers compose the occupational group, exposed to high risk of voice disorder, it is necessary to identify the cause of speech problems and speech disorders. The purpose of this study is to analyze the voice characteristics of teachers and to investigate the causes of voice disorders. From 2000 to 2018, 414 studies were found under a combinated set search words of 'profession', 'Teacher', 'Professional Voice User', 'Voice', 'Voice disorders', 'Risk' and out of them, 8 studies were selected as final focus analysis subjects. The qualitative evaluation was carried out by modifying the Quality: checklist for assessing the Risk of bias. The study confirmed that voice misuse frequently occurred to teachers when they used their voice and this feature was affected by the environment. These results suggest that environment improvement of teachers' speech abuse and consistent voice education are necessary.

Noise Cancellation using Microphone Array in Digital Hearing Aids (디지털 보청기에서 마이크로폰 어레이를 이용한 잡음제거)

  • Bang, Dong-Hyeouck;Kil, Se-Kee;Kang, Hyun-Deok;Yoon, Gwang-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.4
    • /
    • pp.857-866
    • /
    • 2009
  • In this paper, a noise cancellation-method using microphone array for digital hearing aids is proposed. The microphone array is located around the ear of a dummy. Speech sound is generated from the forward speaker positioned in the front of the dummy and noise sound is generated from the backward speaker. The speech and noise are mixed in the air space and entered into the microphones. VAD(voice activity detector) and ANC(adaptive noise cancellation) methods were used to eliminate noise in the sound of the microphones. 10 two-syllable words and 4 sentences were used for speech signals. Babble and car interior noise were used for noise signals. The performance of the proposed algorithm was evaluated by SNR(signal-to-noise ratio) and PESQ-MOS(perceptual evaluation of speech quality-mean opinion score). In babble noise condition, SNR was improved as much as $7.963{\pm}1.3620dB\;and\;3.968{\pm}0.6659dB$ for words and sentences respectively. In the case of car interior noise, SNR was improved as $10.512{\pm}2.0665dB\;and\;6.000{\pm}1.7642dB$ for words and sentences respectively. PESQ-MOS of the babble noise was improved as much as $0.1722{\pm}0.0861$ score for words and $0.083{\pm}0.0417$ score for sentences. And PESQ-MOS of the car interior noise was improved as $0.2661{\pm}0.0335$ score and $0.040{\pm}0.0201$ score for words and sentences respectively. It is verified that the proposed algorithm has a good performance in noise cancellation of microphone array for digital hearing aids.

Improved Harmonic-CELP Speech Coder with Dual Bit-Rates(2.4/4.0 kbps) (이중 전송률(2.4/4.0 kbps)을 갖는 개선된 하모닉-CELP 음성부호화기)

  • 김경민;윤성완;최용수;박영철;윤대희;강태익
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.3C
    • /
    • pp.239-247
    • /
    • 2003
  • This paper presents a dual-rate (2.4/4.0 kbps) Improved Harmonic-CELP(IHC) speech coder based on the EHC(Efficient Harmonic-CELP) which was presented by the authors. The proposed IHC employs the harmonic coding for voiced and the CELP for unvoiced segments. In the IHC, an initial voiced/unvoiced estimate is obtained by the pitch gain and energy. Then, the final V/UV mode is decided by using the frame energy contour. A new harmonic estimation combining peak picking and delta adjustment provides a more reliable harmonic estimation than that in the EHC. In addition, a noise mixing scheme in conjunction with an improved band voicing measurement provides the naturalness of the synthesized speech. To demonstrate the performance of the proposed IHC coder, the coder has been implemented and compared with the 2.0/4.0 kbps HVXC(Harmonic excitation Vector Coding) standardized by MPEG-4. Results of subjective evaluation showed that the proposed IHC coder and produce better speech quality than the HVXC, with only 40% complexity of the HVXC.

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

  • 장효종;최형일
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1037-1043
    • /
    • 2003
  • This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.

The Effect of An Increase of Closed Quotient on Improvement of Voice Quality after Type I Thyroplasty in Patients with Unilateral Vocal Cord Paralysis (일측 성대마비 환자에서 성대내전술 후 성대접촉율의 증가가 음질 개선에 미치는 영향)

  • Kim, Han-Su;Choi, Seung-Hee;Lim, Jae-Yol;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.1
    • /
    • pp.16-20
    • /
    • 2004
  • Purpose : To assess perceptual, acoustic and aerodynamic measure of voice quality in patients with unilateral vocal cord paralysis before and after type I thyroplasty. Methods : The clinical records of patients operated type I thyroplasty in the Departement of otorhinoalryngolgy, Yongdong Severance hospital from November 2001 to November 2003 were reviewed. All patients uderwent a vocal function evaluation including perceptual, acoustic and aerodynamic measures of voice preoperative and on $60^{th}$ postoperative day. The perceptual and acoustic measures were obtained from recording of patients' reading a 'Sanchak' passage. The perceptual evaluation was performed by 2 speech pathologist using a 4-point rating scale. Acoustic parameters(voice range profile low(RAL), voice range profile high(RAH), average fundamental frequency(AFX), closed quotient, harmonic to noise ratio, jitter and shimmer) were investigated by Lx speech studio. Mean flow rate(MFR), subglottic pressure(Psub) and intensity were measured using the Phonatory function analyzer. The maximum phonation time was also measured. The data were statistically analyzed. A paired t-test (p<0.1) was used to compare preoperative and postoperative results. And multiple regression test was used to find which parameter was most correlated to improvement of postoperative voice quality. Results : Among aerodynamic parameters, Psub $(88.11mmH_2O{\rightarrow}58.7mmH_2O)$, MPT(7.87sec${\rightarrow}$12.53sec), MFR (359.8ml/sec${\rightarrow}$161.06ml/sec) were statistically improved. AFx(205.5Hz${\rightarrow}$163.27Hz), AQx(23.9%${\rightarrow}$48.3%), RAL, RAH. Jotter and shimmer were improved. In multiple regression test, AFx and AQx was noted as the two meost correlated parameters to improvement of postoperative breathiness. But general grade of voice quality was more correlated to Psub and shimmer. Conclusion : Vocal fold medialization procedures effectively reduce glottic gap. Increasing of contact area of both vocal folds induced improvement in aerodynamic parameters and leaded stabilizing of vocal fold vibration. That effect results in improvement in acoustic parameters (shimmer, jitter, signal-to-noise ratio, voice range profile) and voice quality.

  • PDF

Effects of vowel types and sentence positions in standard passage on auditory and cepstral and spectral measures in patients with voice disorders (모음 유형과 표준문단의 문장 위치가 음성장애 환자의 청지각적 및 켑스트럼 및 스펙트럼 분석에 미치는 효과)

  • Mi-Hyeon Choi;Seong Hee Choi
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.81-90
    • /
    • 2023
  • Auditory perceptual assessment and acoustic analysis are commonly used in clinical practice for voice evaluation. This study aims to explore the effects of speech task context on auditory perceptual assessment and acoustic measures in patients with voice disorders. Sustained vowel phonations (/a/, /e/, /i/, /o/, /u/, /ɯ/, /ʌ/) and connected speech (a standardized paragraph 'kaeul' and nine sub-sentences) were obtained from a total of 22 patients with voice disorders. GRBAS ('G', 'R', 'B', 'A', 'S') and CAPE-V ('OS', 'R', 'B', 'S', 'P', 'L') auditory-perceptual assessment were evaluated by two certified speech language pathologists specializing in voice disorders using blind and random voice samples. Additionally, spectral and cepstral measures were analyzed using the analysis of dysphonia in speech and voice model (ADSV).When assessing voice quality with the GRBAS scale, it was not significantly affected by the vowel type except for 'B', while the 'OS', 'R' and 'B' in CAPE-V were affected by the vowel type (p<.05). In addition, measurements of CPP and L/H ratio were influenced by vowel types and sentence positions. CPP values in the standard paragraph showed significant negative correlations with all vowels, with the highest correlation observed for /e/ vowel (r=-.739). The CPP of the second sentence had the strongest correlation with all vowels. Depending on the speech stimulus, CAPE-V may have a greater impact on auditory-perceptual assessment than GRBAS, vowel types and sentence position with consonants influenced the 'B' scale, CPP, and L/H ratio. When using vowels in the voice assessment of patients with voice disorders, it would be beneficial to use not only /a/, but also the vowel /i/, which is acoustically highly correlated with 'breathy'. In addition, the /e/ vowel was highly correlated acoustically with the standardized passage and sub-sentences. Furthermore, given that most dysphonic signals are aperiodic, 2nd sentence of the 'kaeul' passage, which is the most acoustically correlated with all vowels, can be used with CPP. These results provide clinical evidence of the impact of speech tasks on auditory perceptual and acoustic measures, which may help to provide guidelines for voice evaluation in patients with voice disorders.