• Title/Summary/Keyword: speech quality evaluation

Search Result 178, Processing Time 0.024 seconds

Evaluation of a signal segregation by FDBM (FDBM의 음원분리 성능평가)

  • Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.12
    • /
    • pp.1793-1802
    • /
    • 2013
  • Various approaches for sound source segregation have been proposed. Among these approaches, frequency domain binaural model(FDBM) has the advantages of low computational load and effective howling cancellation. A binaural hearing assistance system based on FDBM has been proposed. This system can enhance desired signal based on the directivity information. Although FDBM has been evaluated in terms of signal-to-noise ratio (SNR) and coherence function, the evaluation results do not always agree with the human impressions. These evaluation methods provide physical measures, and do not take account of perceptual aspect of human being. Considering a binaural hearing assistance system as a one of major applications, the quality of segregated sound should keep level enough. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and Perceptual Evaluation of Speech Quality(PESQ), to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and PESQ, to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions.

Quality Assessment and Predistortion Evaluation of the Multi-channel Audio Codec according to the bitrate changing (압축율 변화에 따른 멀티채널 오디오의 품질 및 Predistortion 의 영향 평가)

  • Cha, Kyung-Hwan;Jang, Dae-Young;Kim, Sung-Han;Kim, Chun-Duck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.55-60
    • /
    • 1996
  • This paper describes the subjective assessment of the multi-channel audio quality according to the bitrate changing and evaluates the predistortion effect to avoid the unmasked noise after matrixing/dematrxing process in transmission and regeneration of the multi-channel audio. The simulation is processed by the perceptual coding that is MPEG-2 Audio layer II algorithm. We evaluate the quality improvement about predistortion using or not by 384, 320, 256, 128kbps. As the result of the double blind subjective assessment, 5 Grade-Impairment Scale is scored under minus one to 320kbps and so audio quality is evaluated to be perceptible, but not annoying in 3/2 channel. The effect of the predistortion is improved one level in 128kbps and especially speech test material I better improved than music test materials.

  • PDF

The Therapeutic Effects of $SKTCLP^{(R)}$ in Patients with Mutational Dysphonia (생리적 발성 기법의 변성발성장애 치료 적용 효과)

  • Kim, Seong-Tae;Nam, Soon-Yuhl
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.99-105
    • /
    • 2011
  • The treatment for patients with mutational dysphonia typically is useful with vegetative phonation, but has not yet been studied. This study attempts to identify the effect of $SKTCLP^{(R)}$ using throat clearing and laughing in patients with mutational dysphonia. The study, which was designed by the author, included 26 patients aged from 14 to 32 years (mean: 18.7 years) who had been diagnosed with mutational dysphonia between January 2007 and June 2010. Voice therapy for these patients included $SKTCLP^{(R)}$, ranging from two to seven sessions (mean: 3.8 sessions). Results were evaluated by videostroboscopy, perceptual evaluation of GRBAS scale, aerodynamic test, and acoustic analysis before and after therapy. Most patients could phonate with low pitch from the beginning and sustain with normal pitch sound in the last session. We had found that glottic gap reduced after therapy and anterior-posterior compression of superior laryngeal part at the first time, and these patients had complete closure of the glottis after treatment. The results of acoustic and aerodynamic measures after treatment indicated significant decreases in Fo, Jitter, Shimmer, SFF, and SPI, and increases in MPT, Psub, and vocal efficiency (p<.05). $SKTCLP^{(R)}$ may be a useful treatment method in managing mutational dysphonia. We can suggest this technique may be useful in improving the voice quality of other functional dysphonia having glottal chink or functional aphonia.

  • PDF

Transcoding Algorithm for SMV and G.723.1 Vocoders via Direct Parameter Transformation (SMV와 G.723.1 음성부호화기를 위한 파라미터 직접 변환 방식의 상호부호화 알고리듬)

  • 서성호;장달원;이선일;유창동
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.6
    • /
    • pp.61-70
    • /
    • 2003
  • In this paper, a transcoding algorithm for the Selectable Mode Vocoder (SMV) and the G.723.1 speech coder via direct parameter transformation is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm converts the parameters of one coder to the other without going through the decoding and encoding process. The proposed algorithm is composed of four parts: the parameter decoding, line spectral pair (LSP) conversion, pitch period conversion, excitation conversion and rate selection. The evaluation results show that the proposed algorithm achieves equivalent speech quality to that of tandem transcoding with reduced computational complexity and delay.

The Comparisons of GRBAS Perceptual Judgments according to Levels of Utterances

  • Pyo, Hwa-Young;Sim, Hyun-Sub
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.135-142
    • /
    • 2001
  • The present study was performed to investigate adequate levels of utterances which can give essential as well as useful information about the patients' voice, by examining the degrees of correlation between the levels of utterances (vowels, words, and phrase paragraph reading) and the entire utterance including all of the levels. For this purpose, a total of 10 individual utterance samples (5 vowels, 3 words, 1 phrase, 1 paragraph reading) were collected from each of the 30 subjects with voice disorder patients, and four experienced voice therapists evaluated them using GRBAS. The results showed that four therapists highly agreed upon on 'G' parameter. The coefficient of the correlation between each level of utterance and entire utterance tended to be above 0.70. Judgements of the vowel /$\varepsilon$/ as well as /o/ highly correlated with the judgement of the entire utterance. Regardless of severity, the judgement of the entire utterance highly correlated with the judgements of the vowel /u/ and the paragraph reading. These results suggest that experienced voice therapists can precisely evaluate patients' voice quality with only one sustained vowel in the clinic field, as is done with the entire utterance evaluation.

  • PDF

Speaker-Dependent Emotion Recognition For Audio Document Indexing

  • Hung LE Xuan;QUENOT Georges;CASTELLI Eric
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.92-96
    • /
    • 2004
  • The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel­Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.

  • PDF

Study on optimal number of latent source in speech enhancement based Bayesian nonnegative matrix factorization (베이지안 비음수 행렬 인수분해 기반의 음성 강화 기법에서 최적의 latent source 개수에 대한 연구)

  • Lee, Hye In;Seo, Ji Hun;Lee, Young Han;Kim, Je Woo;Lee, Seok Pil
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.418-420
    • /
    • 2015
  • 본 논문은 베이지안 비음수 행렬 인수분해 (Bayesian nonnegative matrix factorization, BNMF) 기반의 음성 강화 기법에서 음성과 잡음 성분의 latent source 수에 따른 강화성능에 대해 서술한다. BNMF 기반의 음성 강화 기법은 입력 신호를 서브 신호들의 합으로 분해한 후, 잡음 성분을 제거하는 방식으로 그 성능이 기존의 NMF 기반의 방법들보다 우수한 것으로 알려져 있다. 그러나 많은 계산량과 latent source 의 수에 따라 성능의 차이가 있다는 단점이 있다. 이러한 단점을 개선하기 위해 본 논문에서는 BNMF 기반의 음성 강화 기법에서 최적의 latent source 개수를 찾기 위한 실험을 진행하였다. 실험은 잡음의 종류, 음성의 종류, 음성과 잡음의 latent source 의 개수, 그리고 SNR 을 바꿔가며 진행하였고, 성능 평가 방법으로 PESQ (perceptual evaluation of speech quality) 를 이용하였다. 실험 결과, 음성의 latent source 개수는 성능에 영향을 주지 않지만, 잡음의 latent source 개수는 많을수록 성능이 좋은 것으로 확인되었다.

  • PDF

Speech enhancement based on reinforcement learning (강화학습 기반의 음성향상기법)

  • Park, Tae-Jun;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.335-337
    • /
    • 2018
  • 음성향상기법은 음성에 포함된 잡음이나 잔향을 제거하는 기술로써 마이크로폰으로 입력된 음성신호는 잡음이나 잔향에 의해 왜곡되어지므로 음성인식, 음성통신 등의 음성신호처리 기술의 핵심 기술이다. 이전에는 음성신호와 잡음신호 사이의 통계적 정보를 이용하는 통계모델 기반의 음성향상기법이 주로 사용되었으나 통계 모델 기반의 음성향상기술은 정상 잡음 환경과는 달리 비정상 잡음 환경에서 성능이 크게 저하되는 문제점을 가지고 있었다. 최근 머신러닝 기법인 심화신경망 (DNN, deep neural network)이 도입되어 음성 향상 기법에서 우수한 성능을 내고 있다. 심화신경망을 이용한 음성 향상 기법은 다수의 은닉 층과 은닉 노드들을 통하여 잡음이 존재하는 음성 신호와 잡음이 존재하지 않는 깨끗한 음성 신호 사이의 비선형적인 관계를 잘 모델링하였다. 이러한 심화신경망 기반의 음성향상기법을 향상 시킬 수 있는 방법 중 하나인 강화학습을 적용하여 기존 심화신경망 대비 성능을 향상시켰다. 강화학습이란 대표적으로 구글의 알파고에 적용된 기술로써 특정 state에서 최고의 reward를 받기 위해 어떠한 policy를 통한 action을 취해서 다음 state로 나아갈지를 매우 많은 경우에 대해 학습을 통해 최적의 action을 선택할 수 있도록 학습하는 방법을 말한다. 본 논문에서는 composite measure를 기반으로 reward를 설계하여 기존 PESQ (Perceptual Evaluation of Speech Quality) 기반의 reward를 설계한 기술 대비 음성인식 성능을 높였다.

Experimental Results of SSB Modem in Shallow Sea (천해에서 SSB 모뎀의 실험결과 분석)

  • Ju, Hyng-Jun;Han, Jung-Woo;Kim, Ki-Man
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.6
    • /
    • pp.990-998
    • /
    • 2008
  • In this paper we achieve experimental data evaluation using SSB(Single-side band) modulation in the ocean. Present research in underwater communication is applying digital modulation, OFDM and MIMO system. However, Commercial modems using analog modulation techniques in oceans. So, we achieved experimental for modem appliance development of correct high quality in South Korea sea characteristics. This experimets achievd useing SSB analog modulation in Jin-hae shore of shallow water condition. Used data are tonal and LFM signal for getting underwater channel characterisitcs and female Korean speech for speech communications.

Perioperative Management of the Voice in Thyroid Cancer (갑상선암 수술과 수술 전후 음성관리)

  • Yoon, So Yeon;Hong, Hyun Jun
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.31 no.2
    • /
    • pp.49-55
    • /
    • 2020
  • Evaluating the patient's voice before thyroidectomy is useful for the purpose of identifying patients with vocal cord paralysis without symptoms, identifying other patient's voice abnormalities, and whether it is related to voice disorders that may occur after surgery. Also voice evaluation after thyroid surgery is helpful in diagnosis, treatment, and rehabilitation and follow-up of voice disorders that occur without clear nerve damage after thyroidectomy. And it is helpful for rapid recovery through active early rehabilitation treatment for patients who complain of speech impairment without paralysis. In particular, neck exercise can improve the adhesion of the surgical site and increase the range of motion of the neck as well as improve subjective neck discomfort. In addition, hearing, voice and breathing functions should be improved, and voice hygiene education and counseling should be provided. Vocal cord injection is the first treatment option for unilateral vocal cord palsy. By establishing a protocol for voice disorders before and after thyroid surgery and providing appropriate treatment, the quality of life of patients can be improved.