• Title/Summary/Keyword: speech quality evaluation

Search Result 178, Processing Time 0.024 seconds

Dialog System Using Multimedia Techniques for the Elderly with Dementia

  • 김성일;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.170-170
    • /
    • 2002
  • The goal of the present research is to improve a quality of life of the elderly with a dementia. In this paper, it is realized by developing the dialog system that is controlled by three kinds of modules such as speech recognition engine, graphical agent, or database classified by a nursing schedule. The system was evaluated in an actual environment of a nursing facility by introducing it to an older male patient with dementia. The comparison study between dialog system and professional caregivers was then carried out at nursing home for 5 days in each case. The evaluation results showed that the dialog system was more responsive in catering to needs of dementia patient than professional caregivers. Moreover, the proposed system led the patient to talk more than caregivers did.

A Study on the Performance of Companding Algorithms for Digital Hearing Aid Users (디지털 보청기 사용자를 위한 압신 알고리즘의 성능 연구)

  • Hwang, Y.S.;Han, J.H.;Ji, Y.S.;Hong, S.H.;Lee, S.M.;Kim, D.W.;Kim, In-Young;Kim, Sun-I.
    • Journal of Biomedical Engineering Research
    • /
    • v.32 no.3
    • /
    • pp.218-229
    • /
    • 2011
  • Companding algorithms have been used to enhance speech recognition in noise for cochlea implant users. The efficiency of using companding for digital hearing aid users is not yet validated. The purpose of this study is to evaluate the performance of the companding for digital hearing aid users in the various hearing loss cases. Using HeLPS, a hearing loss simulator, two different sensorinerual hearing loss conditions were simulated; mild gently sloping hearing loss(HL1) and moderate to steeply sloping hearing loss(HL2). In addition, a non-linear compression was simulated to compensate for hearing loss using national acoustic laboratories-non-linear version 1(NAL-NL1) in HeLPS. In companding, the following four different companding strategies were used changing Q values(q1, q2) of pre-filter(F filter) and post filter(G filter). Firstly, five IEEE sentences which were presented with speech-shaped noise at different SNRs(0, 5, 10, 15 dB) were processed by the companding. Secondly, the processed signals were applied to HeLPS. For comparison, signals which were not processed by companding were also applied to HeLPS. For the processed signals, log-likelihood ratio(LLR) and cepstral distance(CEP) were measured for evaluation of speech quality. Also, fourteen normal hearing listeners performed speech reception threshold(SRT) test for evaluation of speech intelligibility. As a result of this study, the processed signals with the companding and NAL-NL1 have performed better than that with only NAL-NL1 in the sensorineural hearing loss conditions. Moreover, the higher ratio of Q values showed better scores in LLR and CEP. In the SRT test, the processed signals with companding(SRT = -13.33 dB SPL) showed significantly better speech perception in noise than those processed using only NAL-NL1(SRT = -11.56 dB SPL).

Possibility of Motor Speech Improvement in People With Spinocerebellar Ataxia via Intensive Speech Treatment (집중치료를 통한 소뇌운동실조증 환자의 말운동개선 가능성)

  • Park, Youngmi
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.634-642
    • /
    • 2018
  • People with spinocerebellar ataxia, a hereditary and progressive neurogenic disorder, suffer from ataxic dysarthria due to cerebellar dystrophy. This study was designed to examine if intensive motor speech treatment yields improvement in progressive ataxic dysarthria and if then, to investigate magnitude of therapeutic effect. SPEAK $OUT!^{(R)}$ was provided to a 55-year old female diagnosed with SCA for improving motor speech functions. Magnitude of therapeutic effect was large in changes of MPT and vocal intensity across speech tasks. Small effect size was found in changes of fundamental frequency, however, large therapeutic effect was observed in changes of frequency range. In addition, improvement of vocal quality based on jitter, shimmer, and HNR was observed with large therapeutic effect size and vowel space was expanded, particularly, due to F1. Lastly, VHI scores were decreased. Intensive motor speech treatment, called as SPEAK $OUT!^{(R)}$ was effective enough to observe improvement in vocal intensity, frequency range, and vocal quality, expanding vowel space and lowering VHI scores. Based on the results of this case study, further efficacy evaluation of SPEAK $OUT!^{(R)}$ for improving progressive ataxic dysarthria in people with SCA is required.

The Assessment on the Sound Quality of Reduced Frequency Selectivity of Hearing Impaired People (난청인의 주파수 선택도 둔화현상이 음질에 미치는 영향 평가)

  • An, Hong-Sub;Park, Gyu-Seok;Jeon, Yu-Yong;Song, Young-Rok;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.6
    • /
    • pp.1196-1203
    • /
    • 2011
  • The reduced frequency selectivity is a typical phenomenon of sensorineural hearing loss. In this paper, we compared two modeling methods for reduced frequency selectivity of hearing impaired people. The two models of reduced frequency selectivity were made using LPC(linear prediction coding) algorithm and bandwidth control algorithm based on ERB(equivalent rectangular bandwidth) of auditory filter, respectively. To compare the effectiveness of two models, we compared the result of PESQ (perceptual evaluation of speech quality) and LLR(log likelihood ratio) using 36 Korean words of two syllables. To verify the effect on noise condition, we mixed white and babble noise with 0dB and -3dB SNR to speech words. As the result, it is confirmed that the PESQ score of bandwidth control algorithm is higher than the score of LPC algorithm, on the other hands, and the LLR score of LPC algorithm is lower than the score of bandwidth control algorithm. It means that both non-linearity and widen auditory filter characteristics caused by reduced frequency selectivity could be more reflected in bandwidth control algorithm than in LPC algorithm.

The Effect on Intervention Program and Auditory-Perceptual Discrimination Feature of Postlingual Cochlear Implant Adults about Pathological Voice (병리적 음성에 대한 언어습득 이후 인공와우이식 성인의 청지각적 변별특성과 중재 프로그램의 효과)

  • Bae, Inho;Kim, Geunhyo;Lee, Yeonwoo;Park, Heejune;Kim, Jindong;Lee, Ilwoo;Kwon, Soonbok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.9-17
    • /
    • 2015
  • In the present study, we investigated ability of recognition of auditory perception with regards to the quality of voice in postlingual CI adults and proposed a training program to improve within subject reliability. A prospective case-control study was conducted in adults with 7 postlingual deaf who received a CI surgery and 10 normal hearing controls. The pre and post test and training program included parameters of consensus auditory-perceptual evaluation of voice(CAPE-V) with pathological voice sample by using Alvin. In results of pre-post test for monitoring improvements of internal reliability for listeners via the training program, there was statistically significant difference in both test and group. There was statistically significant difference in internal reliability between pre-post test in the normal hearing group, the result was no significant in the CI group. The present study found that CI adults showed less ability in awareness of voice quality compared to normal hearing group. Also the training program improved pitch and loudness in CI adults.

Speech Enhancement Based on IMCRA Incorporating noise classification algorithm (잡음 환경 분류 알고리즘을 이용한 IMCRA 기반의 음성 향상 기법)

  • Song, Ji-Hyun;Park, Gyu-Seok;An, Hong-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.12
    • /
    • pp.1920-1925
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA) in non-stationary noisy environment. The conventional IMCRA algorithm efficiently estimate the noise power by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. Since the minimum of smoothing parameter is defined as 0.85, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. For this reason, we proposed the modified IMCRA, which adaptively estimate and updata the noise power according to the noise type classified by the Gaussian mixture model (GMM). The performances of the proposed method are evaluated by perceptual evaluation of speech quality (PESQ) and composite measure under various environments and better results compared with the conventional method are obtained.

Real-Time H/W Implementation of RPE-LTP Speech Coder for Digital Mobile Communications (디지틀 이동 통신용 RPE-LTP 음성 부호화기의 실시간 H/W 구현)

  • 김선영;김재공
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.1
    • /
    • pp.85-100
    • /
    • 1991
  • In the discussion of digital mobile communication systems the speech coder based on the high quality low bit rate is an essential part of topics to overcome the limited availability of radio spectrum, which will enhance the communication services. In this paper we present the implementation and performance evaluation of 13kbps RPE LTP speech coder. An implementation of a real time full duplex coder with 75% of DSP loading rate using a single DSP chip has been shown, and also the fixed point simulations for H/W implementation has been performed. Finally, analysis result for relative bit importance of each transmitting parameter has been shown for channel coding.

  • PDF

The Acoustic Changes of Voice after Uvulopalatopharyngoplasty (구개인두성형술 후 음성의 음향학적 변화)

  • Hong, K.H.;Kim, S.W.;Yoon, H.W.;Cho, Y.S.;Moon, S.H.;Lee, S.H.
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.23-37
    • /
    • 2001
  • The primary sound produced by the vibration of vocal folds reaches the velopharyngeal isthmus and is directed both nasally and orally. The proportions of the each component is determined by the anatomical and functional status of the soft palate. The oral sounds composed of oral vowels and consonants according to the status of vocal tract, tongue, palate and lips. The nasal sounds composed of nasal consonants and nasal vowels, and further modified according to the status of the nasal airway, so anatomical abnormalities in the nasal cavity will influence nasal sound. The measurement of nasal sounds of speech has relied on the subjective scoring by listeners. The nasal sounds are described with nasality and nasalization. Generally, nasality has been assessed perceptually in the effect of maxillofacial procedures for cleft palate, sleep apnea, snoring and nasal disorders. The nasalization is considered as an acoustic phenomenon. Snoring and sleep apnea is a typical disorders due to abundant velopharynx. The sleep apnea has been known as a cessation of breathing for at least 10 seconds during sleep. Several medical and surgical methods for treating sleep apnea have been attempted. The uvulopalatopharyngoplasty(UPPP) involves removal of 1.0 to 3.0 cm of soft palate tissue with removal of redundant oropharyngeal mucosa and lateral tissue from the anterior and sometimes posterior faucial pillars. This procedure results in a shortened soft palate and a possible risk following this surgery may be velopharyngeal malfunctioning due to the shortened palate. Few researchers have systematically studied the effects of this surgery as it relates to speech production. Some changes in the voice quality such as resonance (nasality), articulation, and phonation have been reported. In view of the conflicting reports discussed, there remains some uncertainty about the speech status in patients following the snoring and sleep apnea surgery. The study was conducted in two phases: 1) acoustic analysis of oral and nasal sounds, and 2) evaluation of nasality.

  • PDF

A Study on the Perceptual Aspects of an Emotional Voice Using Prosody Transplantation (운율이식을 통해 나타난 감정인지 양상 연구)

  • Yi, So-Pae
    • MALSORI
    • /
    • no.62
    • /
    • pp.19-32
    • /
    • 2007
  • This study investigated the perception of emotional voices by transplanting some or all of the prosodic aspects, i.e. pitch, duration, and intensity, of the utterances produced with emotional voices onto those with normal voices and vice versa. Listening evaluation by 24 raters revealed that prosodic effect was greater than segmental & vocal quality effect on the preception of the emotion. The degree of influence of prosody and that of segments & vocal quality varied according to the type of emotion. As for fear, prosodic elements had far greater influence than segmental & vocal quality elements whereas segmental and vocal elements had as much effect as prosody on the perception of happy voices. Different amount of contribution to the perception of emotion was found among prosodic features with the descending order of pitch, duration and intensity. As for the length of the utterances, the perception of emotion was more effective with long utterances than with short utterances.

  • PDF

Human Laughter Generation using Hybrid Generative Models

  • Mansouri, Nadia;Lachiri, Zied
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1590-1609
    • /
    • 2021
  • Laughter is one of the most important nonverbal sound that human generates. It is a means for expressing his emotions. The acoustic and contextual features of this specific sound are different from those of speech and many difficulties arise during their modeling process. During this work, we propose an audio laughter generation system based on unsupervised generative models: the autoencoder (AE) and its variants. This procedure is the association of three main sub-process, (1) the analysis which consist of extracting the log magnitude spectrogram from the laughter database, (2) the generative models training, (3) the synthesis stage which incorporate the involvement of an intermediate mechanism: the vocoder. To improve the synthesis quality, we suggest two hybrid models (LSTM-VAE, GRU-VAE and CNN-VAE) that combine the representation learning capacity of variational autoencoder (VAE) with the temporal modelling ability of a long short-term memory RNN (LSTM) and the CNN ability to learn invariant features. To figure out the performance of our proposed audio laughter generation process, objective evaluation (RMSE) and a perceptual audio quality test (listening test) were conducted. According to these evaluation metrics, we can show that the GRU-VAE outperforms the other VAE models.