• 제목/요약/키워드: Speech improvement

검색결과 610건 처리시간 0.025초

SPLICE 방법에 기반한 잡음 환경에서의 음성 인식 성능 향상 (Performance Improvement ofSpeech Recognition Based on SPLICEin Noisy Environments)

  • 김종현;송화진;이종석;김형순
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.103-118
    • /
    • 2005
  • The performance of speech recognition system is degraded by mismatch between training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE) was introduced to overcome environmental mismatch using stereo data. In this paper, we propose several methods to improve the conventional SPLICE and evaluate them in the Aurora2 task. We generalize SPLICE to compensate for covariance matrix as well as mean vector in the feature space, and thereby yielding the error rate reduction of 48.93%. We also employ the weighted sum of correction vectors using posterior probabilities of all Gaussians, and the error rate reduction of 48.62% is achieved. With the combination of the above two methods, the error rate is reduced by 49.61% from the Aurora2 baseline system.

  • PDF

모음길이 비율에 따른 발화속도 보상을 이용한 한국어 음성인식 성능향상 (An Improvement of Korean Speech Recognition Using a Compensation of the Speaking Rate by the Ratio of a Vowel length)

  • 박준배;김태준;최성용;이정현
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 컴퓨터소사이어티 추계학술대회논문집
    • /
    • pp.195-198
    • /
    • 2003
  • The accuracy of automatic speech recognition system depends on the presence of background noise and speaker variability such as sex, intonation of speech, and speaking rate. Specially, the speaking rate of both inter-speaker and intra-speaker is a serious cause of mis-recognition. In this paper, we propose the compensation method of the speaking rate by the ratio of each vowel's length in a phrase. First the number of feature vectors in a phrase is estimated by the information of speaking rate. Second, the estimated number of feature vectors is assigned to each syllable of the phrase according to the ratio of its vowel length. Finally, the process of feature vector extraction is operated by the number that assigned to each syllable in the phrase. As a result the accuracy of automatic speech recognition was improved using the proposed compensation method of the speaking rate.

  • PDF

보툴리눔독소 주입에 의한 음성장애 및 언어장애의 치료 (Botulinum Toxin Injection for the Treatment of Voice and Speech Disorders)

  • 최홍식
    • 음성과학
    • /
    • 제3권
    • /
    • pp.5-17
    • /
    • 1998
  • Botulinum toxin, a neurotoxin derived from Clostridia Botulinum, has been injected into the target muscle(s) for the treatment of several kinds of voice and speech disorders at the Voice Clinic, Yonsei Institute of Logopedics and Phoniatrics since December 1995. Criteria for the diagnosis and method of injection for spasmodic dysphonia, mutational dysphonia, muscle tension dysphonia, dysphonia after total laryngectomy, and stuttering were summarized. Among 144 patients with adductor type spasmodic dysphonia, who were injected one time to maximum 8 times during the 27 months, 90% were recognized as having better than slight improvement. Even though the injected cases were small, not only the abductor type spasmodic dysphonia, but also the intractable mutational dysphonia or muscle tension dysphonia resistant to voice therapy revealed that botulinum toxin injection would be another options for treatment. Patients who cannot phonate after total laryngectomy and some forms of adulthood stutterers can also be candidates for the injection of botulinum toxin.

  • PDF

공명장애 진단모델 개발을 위한 질적 연구 (A Qualitative Study for the Development of the Assessment Model for Korean Resonance Disorders)

  • 한진순;심현섭
    • 음성과학
    • /
    • 제13권4호
    • /
    • pp.157-173
    • /
    • 2006
  • Speech-language therapist's experiences of their clinical practice offer greater insight to develop the assessment model for resonance disorders appropriate to the clinical setting. In order to investigate their experiences of resonance disorders qualitatively, a semi-structured interview questionnaire was developed on the basis of the review of the literatures about the assessment procedures. From the interviews with 4 speech therapists analysed by using a qualitative, constant-comparative method, 3 main themes were derived: (1) the currently accepted definitions and characteristics of the resonance disorders, (2) the status quo of the assessment procedures, and (3) the needs for the improvement of the assessment procedures. In addition, 15 sub-themes were emerged from the 3 main themes. All themes mentioned by the therapists provide the directions for the development of comprehensive and valid assessment model for the resonance disorders in Korea.

  • PDF

자기 상관감법에 의한 잡음음성의 개선된 LPC 해석 (Improving LPC Analysis of Noisy Speech by Autocorrelation Subtraction Method)

  • 은종관;최기영
    • 한국음향학회지
    • /
    • 제1권1호
    • /
    • pp.45-53
    • /
    • 1982
  • A robust linear predictive coding method that can be used in noisy as well as quiet environment has been studied. In this method, noise autocorrelation coeffieients are first obtained and updated during nonspeech periods. Then, the effect of additive noise in the input speech is removed by subtracting values of the noise autocorrelation coefficients of corrupted speech in the course of computation of linear prediction coefficients. When signal-to-noise ratio of the input speech ranges from 0 to 10 dB, a performance improvement of about 5 dB can be gained by using this method. The proposed method is computationally very efficient and requires a small storage area.

  • PDF

가중특징 Mahalanobis거리를 이용한 마이크 어레이 음석인식의 성능향상 (Performance Improvement of Microphone Array Speech Recognition Using Features Weighted Mahalanobis Distance)

  • ;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • 제29권1E호
    • /
    • pp.45-53
    • /
    • 2010
  • In this paper, we present the use of the Features Weighted Mahalanobis Distance (FWMD) in improving the performance of Likelihood Maximizing Beamforming (Limabeam) algorithm in speech recognition for microphone array. The proposed approach is based on the replacement of the traditional distance measure in a Gaussian classifier with adding weight for different features in the Mahalanobis distance according to their distances after the variance normalization. By using Features Weighted Mahalanobis Distance for Limabeam algorithm (FWMD-Limabeam), we obtained correct word recognition rate of 90.26% for calibrate Limabeam and 87.23% for unsupervised Limabeam, resulting in a higher rate of 3% and 6% respectively than those produced by the original Limabearn. By implementing a HM-Net speech recognition strategy alternatively, we could save memory and reduce computation complexity.

연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화 (Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition)

  • 신옥근
    • 한국음향학회지
    • /
    • 제23권8호
    • /
    • pp.583-589
    • /
    • 2004
  • 포만트 등의 음향학적인 정보를 이용하지 않는 연속음성인식 (CSR)을 위한 벡터 양자화기 기반의 화자 정규화 방법을 제안한다. 이 방법은 앞서 제안한 간단한 숫자음 인식기를 위한 화자정규화 방법을 개선한 것으로, 코드북의 크기를 증가시켜 가면서 벡터양자화기를 반복적으로 학습시킴으로써 정규화된 코드북을 구한 다음, 치를 이용하여 시험용화자의 워핑계수를 추정한다. 코드북 생성과 워핑계수 추정을 위해 모음 음소의 집합과 자음과 모음을 포함한 모든 음소의 집합 등 두 가지 음소집합을 이용i,겨 실험하였으며, 추정한 워핑계수에 상응하는 구간선형 워핑함수를 이용하여 인식기의 학습과 시험에 사용될 특징벡터를 워핑하였다. TIMIT 코퍼스와 HTK toolkit을 이용한 음소인식 실험을 수행하여 제안하는 방법의 성능을 조사한 결과, 포만트를 이용한 워핑 방법과 비슷한 성능을 가짐을 확인하였다.

Feasibility of Revision Cochlear Implant Surgery for Better Speech Comprehension

  • Hwang, Kyurin;Lee, Jae Yong;Oh, Hyeon Seok;Lee, Byung Don;Jung, Jinsei;Choi, Jae Young
    • Journal of Audiology & Otology
    • /
    • 제23권2호
    • /
    • pp.112-117
    • /
    • 2019
  • Background and Objectives: The purpose of this study was to evaluate the efficacy of revision cochlear implant (CI) surgery for better speech comprehension targeting patients with low satisfaction after first CI surgery. Subjects and Methods: Eight patients who could not upgrade speech processors because of an too early CI model and who wanted to change the whole system were included. After revision CI surgery, we compared speech comprehension before and after revision CI surgery. Categoies of Auditory Performance (CAP) score, vowel and consonant confusion test, Ling 6 sounds, word and sentence identification test were done. Results: The interval between surgeries ranged from eight years to 19 years. Same manufacturer's latest product was used for revision surgery in six cases of eight cases. Full insertion of electrode was possible in most of cases (seven of eight). CAP score (p-value=0.01), vowel confusion test (p-value=0.041), one syllable word identification test (p-value=0.026), two syllable identification test (p-value=0.028), sentence identification test (p-value=0.028) had significant improvement. Consonant confusion test (p-value=0.063), Ling 6 sound test (p-value=0.066) had improvement but it is not significant. Conclusions: Although there are some limitations of our study design, we could identify the effect of revision (upgrade) CI surgery indirectly. So we concluded that if patient complain low functional gain or low satisfaction after first CI surgery, revision (device upgrade) CI surgery is meaningful even if there is no device failure.

Feasibility of Revision Cochlear Implant Surgery for Better Speech Comprehension

  • Hwang, Kyurin;Lee, Jae Yong;Oh, Hyeon Seok;Lee, Byung Don;Jung, Jinsei;Choi, Jae Young
    • 대한청각학회지
    • /
    • 제23권2호
    • /
    • pp.112-117
    • /
    • 2019
  • Background and Objectives: The purpose of this study was to evaluate the efficacy of revision cochlear implant (CI) surgery for better speech comprehension targeting patients with low satisfaction after first CI surgery. Subjects and Methods: Eight patients who could not upgrade speech processors because of an too early CI model and who wanted to change the whole system were included. After revision CI surgery, we compared speech comprehension before and after revision CI surgery. Categoies of Auditory Performance (CAP) score, vowel and consonant confusion test, Ling 6 sounds, word and sentence identification test were done. Results: The interval between surgeries ranged from eight years to 19 years. Same manufacturer's latest product was used for revision surgery in six cases of eight cases. Full insertion of electrode was possible in most of cases (seven of eight). CAP score (p-value=0.01), vowel confusion test (p-value=0.041), one syllable word identification test (p-value=0.026), two syllable identification test (p-value=0.028), sentence identification test (p-value=0.028) had significant improvement. Consonant confusion test (p-value=0.063), Ling 6 sound test (p-value=0.066) had improvement but it is not significant. Conclusions: Although there are some limitations of our study design, we could identify the effect of revision (upgrade) CI surgery indirectly. So we concluded that if patient complain low functional gain or low satisfaction after first CI surgery, revision (device upgrade) CI surgery is meaningful even if there is no device failure.

발성장애 평가 시 /a/ 모음연장발성 및 문장검사의 켑스트럼 분석 비교 (Comparison of Vowel and Text-Based Cepstral Analysis in Dysphonia Evaluation)

  • 김태환;최정임;이상혁;진성민
    • 대한후두음성언어의학회지
    • /
    • 제26권2호
    • /
    • pp.117-121
    • /
    • 2015
  • Background : Cepstral analysis which is obtained from Fourier transformation of spectrum has been known to be effective indicator to analyze the voice disorder. To evaluate the voice disorder, phonation of sustained vowel /a/ sound or continuous speech have been used but the former was limited to capture hoarseness properly. This study is aimed to compare the effectiveness in analysis of cepstrum between the sustained vowel /a/ sound and continuous speech. Methods : From March 2012 to December 2014, total 72 patients was enrolled in this study, including 24 unilateral vocal cord palsy, vocal nodule and vocal polyp patients, respectively. The entire patient evaluated their voice quality by VHI (Voice Handicap Index) before and after treatment. Phonation of sustained vowel /a/ sample and continuous speech using the first sentence of autumn paragraph was subjected by cepstral analysis and compare the pre-treatment group and post-treatment group. Results : The measured values of pre and post treatment in CPP-a (cepstral peak prominence in /a/ vowel sound) was 13.80, 13.91 in vocal cord palsy, 16.62, 17.99 in vocal cord nodule, 14.19, 18.50 in vocal cord polyp respectively. Values of CPP-s (cepstral peak prominence in text-based speech) in pre and post treatment was 11.11, 12.09 in vocal cord palsy, 12.11, 14.09 in vocal cord nodule, 12.63, 14.17 in vocal cord polyp. All 72 patients showed subjective improvement in VHI after treatment. CPP-a showed statistical improvement only in vocal polyp group, but CPP-s showed statistical improvement in all three groups (p<0.05). Conclusion : In analysis of cepstrum, text-based analysis is more representative in voice disorder than vowel sound speech. So when the acoustic analysis of voice by cepstrum, both phonation of sustained vowel /a/ sound and text based speech should be performed to obtain more accurate result.

  • PDF