• 제목/요약/키워드: Speech Classification

검색결과 403건 처리시간 0.025초

퍼지추론을 이용한 한국어 자음분류에 관한 연구 (A Study on the Consonant Classification Using Fuzzy Inference)

  • 박경식
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1992년도 학술논문발표회 논문집 제11권 1호
    • /
    • pp.71-75
    • /
    • 1992
  • This paper proposes algorithm in order to classify Korean consonant phonemes same as polosives, fricatives affricates into la sounds, glottalized sounds, aspirated sounds. This three kinds of sounds are one of distinctive characters of the Korean language which don't eist in language same as English. This is thesis on classfication of 14 Korean consonants(k, t, p, s, c, k', t', p', s', c', kh, ph, ch) as a previous stage for Korean phone recognition. As feature sets for classification, LPC cepstral analysis. The eperiments are two stages. First, using short-time speech signal analysis and Mahalanobis distance, consonant segments are detected from original speech signal, then the consonants are classified by fuzzy inference. As the results of computer simulations, the classification rate of the speech data was come to 93.75%.

  • PDF

Review of Korean Speech Act Classification: Machine Learning Methods

  • Kim, Hark-Soo;Seon, Choong-Nyoung;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • 제5권4호
    • /
    • pp.288-293
    • /
    • 2011
  • To resolve ambiguities in speech act classification, various machine learning models have been proposed over the past 10 years. In this paper, we review these machine learning models and present the results of experimental comparison of three representative models, namely the decision tree, the support vector machine (SVM), and the maximum entropy model (MEM). In experiments with a goal-oriented dialogue corpus in the schedule management domain, we found that the MEM has lighter hardware requirements, whereas the SVM has better performance characteristics.

비트스트림의 구조 분석을 이용한 음성 부호화 방식 추정 기법 (Blind Classification of Speech Compression Methods using Structural Analysis of Bitstreams)

  • 유훈;박철순;박영미;김종호
    • 한국정보통신학회논문지
    • /
    • 제16권1호
    • /
    • pp.59-64
    • /
    • 2012
  • 본 논문에서는 임의의 음성 압축 비트스트림의 구조를 분석하여 음성 신호의 부호화 방식을 추정 및 분류하는 기법을 제안한다. 저 비트율 전송 및 저장을 위하여 다양한 보코더 방식의 음성 압축 기법이 개발되었는데, 이들은 블록 구조를 반드시 포함하고 있다. 각 부호화 방식을 구분하는데 있어, 본 논문에서는 Measure of Inter-Block Correlation (MIBC)를 이용하여 블록 구조의 유무 및 신호 블록의 길이를 파악하고, 블록 길이가 동일한 부호화 방식의 경우 각 부호화 방식마다 압축 스트림 내의 각 비트 위치별로 상관도 분포가 다르다는 점을 이용하여 해당 부호화 방식을 정확하게 추정하는 기법을 제안한다. 실험 결과 제안한 비트스트림 분석 기법은 다양한 음성 신호의 종류, 음성 신호의 길이 및 잡음 환경에 강인한 검출 능력을 나타냄을 보인다.

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

  • Kwon, Oh-Wook;Lee, Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • 제22권3E호
    • /
    • pp.124-132
    • /
    • 2003
  • We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.

자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교 (Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech)

  • 한승훈;강병옥;동성희
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권8호
    • /
    • pp.365-370
    • /
    • 2023
  • 사람은 노화과정에 따라 발화의 호흡, 조음, 높낮이, 주파수, 언어 표현 능력 등이 변화한다. 본 논문에서는 이러한 변화로부터 발생하는 음향적, 언어적 특징을 기반으로 발화 데이터를 성인과 노인 두 그룹으로 분류하는 성능을 비교하고자 한다. 음향적 특징으로는 발화 음성의 주파수 (frequency), 진폭(amplitude), 스펙트럼(spectrum)과 관련된 특징을 사용하였으며, 언어적 특징으로는 자연어처리 분야에서 우수한 성능을 보이고 있는 한국어 대용량 코퍼스 사전학습 모델인 KoBERT를 통해 발화 전사문의 맥락 정보를 담은 은닉상태 벡터 표현을 추출하여 사용하였다. 본 논문에서는 음향적 특징과 언어적 특징을 기반으로 학습된 각 모델의 분류 성능을 확인하였다. 또한, 다운샘플링을 통해 클래스 불균형 문제를 해소한 뒤 성인과 노인 두 클래스에 대한 각 모델의 F1 점수를 확인하였다. 실험 결과로, 음향적 특징을 사용하였을 때보다 언어적 특징을 사용하였을 때 성인과 노인 분류에서 더 높은 성능을 보이는 것으로 나타났으며, 클래스 비율이 동일하더라도 노인에 대한 분류 성능보다 성인에 대한 분류 성능이 높음을 확인하였다.

다층 퍼셉트론 신경회로망을 이용한 후두 질환 음성 식별 (Detection of Laryngeal Pathology in Speech Using Multilayer Perceptron Neural Networks)

  • 강현민;김유신;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.115-118
    • /
    • 2002
  • Neural networks have been known to have great discriminative power in pattern classification problems. In this paper, the multilayer perceptron neural networks are employed to automatically detect laryngeal pathology in speech. Also new feature parameters are introduced which can reflect the periodicity of speech and its perturbation. These parameters and cepstral coefficients are used as input of the multilayer perceptron neural networks. According to the experiment using Korean disordered speech database, incorporation of new parameters with cepstral coefficients outperforms the case with only cepstral coefficients.

  • PDF

The Study on Korean Phoneme for Korean Speech Recogintion

  • Hwang, Young-Soo
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -2
    • /
    • pp.629-632
    • /
    • 2000
  • In this paper, we studied on the phoneme classification for Korean speech recognition. In the case of making large vocabulary speech recognition system, it is better to use phoneme than syllable or word as recognition unit. And, In order to study the difference of speech recognition according to the number of phoneme as recognition unit, we used the speech toolkit of OGI in U.S.A as recognition system. The result showed that the performance of diphthong being unified was better than that of seperated diphthongs, and we required the better result when we used the biphone than when using mono-phone as recognition unit.

  • PDF

Japanese Vowel Sound Classification Using Fuzzy Inference System

  • Phitakwinai, Suwannee;Sawada, Hideyuki;Auephanwiriyakul, Sansanee;Theera-Umpon, Nipon
    • 한국융합학회논문지
    • /
    • 제5권1호
    • /
    • pp.35-41
    • /
    • 2014
  • An automatic speech recognition system is one of the popular research problems. There are many research groups working in this field for different language including Japanese. Japanese vowel recognition is one of important parts in the Japanese speech recognition system. The vowel classification system with the Mamdani fuzzy inference system was developed in this research. We tested our system on the blind test data set collected from one male native Japanese speaker and four male non-native Japanese speakers. All subjects in the blind test data set were not the same subjects in the training data set. We found out that the classification rate from the training data set is 95.0 %. In the speaker-independent experiments, the classification rate from the native speaker is around 70.0 %, whereas that from the non-native speakers is around 80.5 %.

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

  • Yoon, Tae-Jin
    • 말소리와 음성과학
    • /
    • 제7권1호
    • /
    • pp.117-124
    • /
    • 2015
  • This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.

후두음성 질환에 대한 인공지능 연구 (Artificial Intelligence for Clinical Research in Voice Disease)

  • 석준걸;권택균
    • 대한후두음성언어의학회지
    • /
    • 제33권3호
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.