• Title/Summary/Keyword: 음향음성학

Search Result 748, Processing Time 0.024 seconds

Endpoint Detection in the Car Noise Environment for Speech Recognition (음성인식을 위한 자동차 소음환경에서의 끝점 검출)

  • 서동권;신원호;양태영;김원구;윤대희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.1
    • /
    • pp.76-79
    • /
    • 1998
  • 소음이 존재하지 않는 환경에서는 에너지 파라메터만으로도 정확한 끝점 검출을 수 행할 수 있으나 신호대 잡음비가 0dB에 가까운 자동차 환경에서는 끝점 검출이 거의 불가 능하다. 본 논문에서는 자동차 소음 환경에서 음성 구간 검출을 위하여 단구간 영교차율과 2∼4kHz의 주파수 영역 에너지를 사용한 끝점 검출 방법을 제안하였다. 제안된 방법과 기 존의 방법의 성능을 DTW를 이용한 단독음 인식 시스템에 적용하여 인식률로 비교하였으 며 제안된 음성 구간 검출 방법을 적용한 경우가 보다 좋은 인식률을 나타내었다.

  • PDF

A Study on the Vowel Recognition of Korean Speech using Spatio-temporal Method (Spatio-temporal 방법을 이용한 우리말 모음 인식에 관한 연구)

  • 송도선;김선일;김석동;이행세
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.4
    • /
    • pp.57-62
    • /
    • 1993
  • 본 논문은 신경망을 이용한 우리말 모음에 대한 인식 연구이다. 음성을 나누거나. 음소별 인식이나, 시간 신축 방법을 사용하지 않고 모음을 인식하였다. 식나의 변화에 따른 음성의 변화를 정적인 음성으로 취급하였다. 10개로 균등히 나눈 프레임에 각 프레임마다 10차의 PARCOR계수를 추출하였다. 신경망의 구조를 간단히 하기 위해서 단모음과 복모음을 구분하여 학습시켰으며, 출력 노드의 수를 감소시키기 위해 이진 코드 형태로 구성하였다.

  • PDF

A Study of Korean Phonetic and Phonological Properties for Speech Recognition and Synthesis (음성 인식/합성을 위한 국어의 음성-음운론적 특성 연구)

  • Chung, Kook;Koo, Hee-San;Lee, Chan-Do;Kim, Jong-Mi;Han , Sun-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.31-44
    • /
    • 1994
  • The paper introduces several studies of various aspects of Korean phonology and phonetics for speech recognition and synthesis. The phonological and phonetic studies presented in this paper are : i) For a study of segmental phonology, we made an annotated list of Korean allophones and their corresponding alphabetic symbols to type into computers. ii) For a study of segmental phonetics, we present some acoustic regulations in Korean consonants according to their phonological environment within a word. iii) For a study of prosodic phonology, we suggest the phonological functions of prosodic features and their acoustic cues. iv) For a study of prosodic phonetics, we present the characteristic patterns of accent and intonation in Korean. v) Finally, we suggest some ways of using this phonological and phonetic knowledge for possible improvement of speech recognition and synthesis.

  • PDF

A Comparison of Speech/Music Discrimination Features for Audio Indexing (오디오 인덱싱을 위한 음성/음악 분류 특징 비교)

  • 이경록;서봉수;김진영
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.10-15
    • /
    • 2001
  • In this paper, we describe the comparison between the combination of features using a speech and music discrimination, which is classifying between speech and music on audio signals. Audio signals are classified into 3classes (speech, music, speech and music) and 2classes (speech, music). Experiments carried out on three types of feature, Mel-cepstrum, energy, zero-crossings, and try to find a best combination between features to speech and music discrimination. We using a Gaussian Mixture Model (GMM) for discrimination algorithm and combine different features into a single vector prior to modeling the data with a GMM. In 3classes, the best result is achieved using Mel-cepstrum, energy and zero-crossings in a single feature vector (speech: 95.1%, music: 61.9%, speech & music: 55.5%). In 2classes, the best result is achieved using Mel-cepstrum, energy and Mel-cepstrum, energy, zero-crossings in a single feature vector (speech: 98.9%, music: 100%).

  • PDF

Voice Activity Detection Method Using Psycho-Acoustic Model Based on Speech Energy Maximization in Noisy Environments (잡음 환경에서 심리음향모델 기반 음성 에너지 최대화를 이용한 음성 검출 방법)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.447-453
    • /
    • 2009
  • This paper introduces the method for detect voices and exact end point at low SNR by maximizing voice energy. Conventional VAD (Voice Activity Detection) algorithm estimates noise level so it tends to detect the end point inaccurately. Moreover, because it uses relatively long analysis range for reflecting temporal change of noise, computing load too high for application. In this paper, the SEM-VAD (Speech Energy Maximization-Voice Activity Detection) method which uses psycho-acoustical bark scale filter banks to maximize voice energy within frames is introduced. Stable threshold values are obtained at various noise environments (SNR 15 dB, 10 dB, 5 dB, 0 dB). At the test for voice detection in car noisy environment, PHR (Pause Hit Rate) was 100%accurate at every noise environment, and FAR (False Alarm Rate) shows 0% at SNR15 dB and 10 dB, 5.6% at SNR5 dB and 9.5% at SNR0 dB.

Utterance display system for speech data acquisition (음성데이터 수집을 위한 발성내용 제시시스팀)

  • 김경태;이용주;정유현
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.1
    • /
    • pp.5-11
    • /
    • 1993
  • 본 논문은 발성자의 자연스러운 음성데이터를 수집하기 위한 발성내용 제시시스팀의 구현에 대하여 기술한다. 대량의 음성정보의 수집 및 처리를 위해서는 이와같은 시스팀이 필수적이다. 왜냐하면, 음성정보처리의 성능 평가는 음성데이터와 발성방법에 따라 죄우되므로 실제의 환경에서 사용되는 자연스러운 음성으로 평가되어야만 객관적인 결과를 얻을 수 있기 때문이다. 따라서 이러한 음성데이터를 효율적으로 수집하기 위한 방법으로써 발성내용 제시시스팀에 관하여 기술하고자 한다. 특히, 본 논문에서는 발성해야 할 데이터를 제시하기 위한 방법으로써 발성내용 제시 시스팀에 관하여 기술하고자 한다. 특히, 본 논문에서는 발성해야 할 데이터를 제시하기 위한 요구사항, 기능, PC에 의한 구현에 대하여 기술한다. 본 시스팀은 음성수집 단계뿐만아니라 수집 후의 편집 작업의 편리성을 고려하여 구현하였으며, 4연속 숫자음 등 96명이 발성한 63,840개의 단어를 수집하는데 적용하였고 수집 과정에서 종래의 리스트를 보고 발성하는 방법에 비해 훨씬 효율적이고 자연스러운 발성을 유도할 수 있었다.

  • PDF

Acoustic model training using self-attention for low-resource speech recognition (저자원 환경의 음성인식을 위한 자기 주의를 활용한 음향 모델 학습)

  • Park, Hosung;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.483-489
    • /
    • 2020
  • This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.

Acoustic Model Improvement and Performance Evaluation of the Variable Vocabulary Speech Recognition System (가변 어휘 음성 인식기의 음향모델 개선 및 성능분석)

  • 이승훈;김회린
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.3-8
    • /
    • 1999
  • Previous variable vocabulary speech recognition systems with context-independent acoustic modeling, could not represent the effect of neighboring phonemes. To solve this problem, we use allophone-based context-dependent acoustic model. This paper describes the method to improve acoustic model of the system effectively. Acoustic model is improved by using allophone clustering technique that uses entropy as a similarity measure and the optimal allophone model is generated by changing the number of allophones. We evaluate performance of the improved system by using Phonetically Optimized Words(POW) DB and PC commands(PC) DB. As a result, the allophone model composed of six hundreds allophones improved the recognition rate by 13% from the original context independent model m POW test DB.

  • PDF

Fundamental Acoustic Investigation of Korean Male 5 Monophthongs (한국 남성의 단모음 [아, 에, 이, 오, 우]에 대한 음향음성학적 기반연구)

  • Choi, Yae-Lin
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.6
    • /
    • pp.373-377
    • /
    • 2010
  • Numerous quantitative and qualitative studies have already been published related to English vowels. However, only minimal amounts of studies based on the acoustic analysis of Korean vowels have been accomplished. The purpose of this study is to obtain sufficient quantitative data based on the acoustic aspects of Korean vowels produced by males between the ages of 20s and 30s. A total of 31 males in their 20s and 30s produced the five fundamental vowels /a, e, i, o, u/ by repeating each of them three times in the standard Korean dialect. Such speech productions were recorded with 'Cool edit' and F1, F2, F3, F4 were extracted through the MATLAB acoustic analysis program. Results indicated that the overall patterns of formants were similar to previous studies, except that the formant levels of F1 and F2 of the vowels produced in this study were generally lower than that in previous studies. Future studies need to focus on obtaining vowel data by considering other factors such as age and other speech materials.

A study on the clinical utility of voiced sentences in acoustic analysis for pathological voice evaluation (장애음성의 음향학적 분석에서 유성음 문장의 임상적 유용성에 관한 연구)

  • Ji-sung Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.298-303
    • /
    • 2023
  • This study aimed to investigate the clinical utility of voiced sentence tasks for voice evaluation. To this end, we analyzed the correlation between perturbation-based acoustic measurements [jitter percent (jitter), shimmer percent (shimmer), Noise to Harmonic Ratio (NHR)] using sustained vowel phonation, and cepstrum-based acoustic measurements [Cepstral Peak Prominence (CPP), Low/High spectral ratio (L/H ratio)] using voiced sentences. As a result of analyzing data collected from 65 patients with voice disorders, there was a significant correlation between the CPP and jitter (r = -.624, p = .000), shimmer (r = -.530, p = .000), NHR (r = -.469, p = .000).This suggests that the cepstrum measurement of voiced sentences can be used as an alternative to the analysis limitations of the pathological voice such as not possible perturbation-based acoustic measurement, and result difference according to the analysis section.