통합 검색 | Korea Science

가변어휘 단어 인식에서의 미등록어 거절 알고리즘 성능 비교 (Performance Comparison of Out-Of-Vocabulary Word Rejection Algorithms in Variable Vocabulary Word Recognition)

김기태;문광식;김회린;이영직;정재호
- 한국음향학회지
- /
- 제20권2호
- /
- pp.27-34
- /
- 2001
발화 검증이란 등록된 단어 목록 이외의 단어가 입력되었을 때, 미등록된 단어는 인식할 수 없는 단어임을 알려주는 기능으로써 사용자에게 친숙한 음성 인식 시스템을 설계하는데 중요한 기술이다. 본 논문에서는 가변어휘 단어 인식기에서 최소 검증 오류를 나타낼 수 있는 발화 검증 시스템의 알고리즘을 제안한다. 우선, 한국전자통신연구원의 PBW(Phonetically Balanced Words) 445DB를 이용하여 가변어휘 단어 인식에서의 미등록어 거절 성능을 향상시키는 효과적인 발화 검증 방법을 제안하였다. 구체적으로 특별한 훈련 과정이 없이도 유사 음소 집합을 많이 포함시킨 반음소 모델을 제안하여 최소 검증 오류를 지니도록 하였다. 또한, 음소 단위의 null hypothesis와 alternate hypothesis의 비를 이용한 음소 단위의 신뢰도는 null hypothesis로 정규화해서 강인한 발화 검증 성능을 보여 주었으며, 음소 단위의 신뢰도를 이용한 단어 단위의 신뢰도는 등록어와 미등록어 사이의 분별력을 잘 표현해 주었다. 이와 같이 새로이 제안된 반음소 모델과 발화 검증 방법을 사용했을 때, CA (Correctly Accept for Keyword: 등록어를 제대로 인정한 경우)는 약 89％, CR (Correctly Reject for OOV (Out-of-Vocabulary): 미등록어에 대해 거절한 경우)은 약 90％로써, 기존 필터 모델을 이용한 방법보다 미등록어 거절 성능이 ERR (Error Reduction Rate) 측면에서 약 15-21％ 향상됨을 알 수 있었다.
PDF

문맥종속 반음소단위에 의한 자동 음운 레이블링 시스템의 구현 및 성능평가 (Implementation of Automatic Phoneme Labelling System Using Context-dependent Demi-phone Unit and Performance Evaluation)

박순철;김태환;김봉완;이용주
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1999년도 학술발표대회 논문집 제18권 2호
- /
- pp.65-70
- /
- 1999
음소 단위로 레이블링된 데이터베이스는 음성연구에 있어 매우 중요하다. 그러나 수작업에 의한 음소분할 및 레이블링 작업은 많은 시간과 노력이 필요하기 때문에 자동 음소분할 및 레이블링 시스템에 대한 많은 연구가 진행되고 있다. 저자들은 자동레이블링 시스템에서 레이블링 분할의 단위로monophone과 triphone의 장점을 포함하는 문맥 종속 반음소 단위 모델을 이용한 자동 음소분할 및 레이블링 시스템을 제안한바 있다[1]. 본 논문에서는 문맥종속 반음소 단위 자동음소분할 및 레이블링 시스템의 성능을 개선하기 위하여, 반음소의 단위를 개선하였다. 기존에 제안된 반음소 단위는 음소의 중점을 기준으로 left/right의 반음소 단위로 양분하였다. 본 논문에서는 음소의 길이가 120ms 이상일 경우 음소의 천이구간의 특성을 잘 나타낼 수 있도록, 음소의 앞뒤구간 각각 60ms를 전반음소와 후반음소로 나누고, 나머지 안정구간을 별도의 모델로 구성하였다. 본 논문에서 제안한 반음소 단위의 성능을 평가하기 위하여 PBW 452단어를 발성한 남자 30명분의 데이터를 이용하여 레이블링 시스템을 훈련하고, 훈련에 사용하지 않은 남자 4명분의 데이터를 이용하여 테스트 하였다. 실험결과, 기존의 반음소 단위에 비하여 10ms에서 $69.09\%$로 $1.65\%$, 20ms에서 $85.32\%$로 $1.02\%$의 성능향상을 가져왔다.
PDF

자동 음성분할 및 레이블링 시스템의 성능향상 (Performance Improvement of Automatic Speech Segmentation and Labeling System)

홍성태;김제우;김형순
- 대한음성학회지:말소리
- /
- 제35_36호
- /
- pp.175-188
- /
- 1998
Database segmented and labeled up to phoneme level plays an important role in phonetic research and speech engineering. However, it usually requires manual segmentation and labeling, which is time-consuming and may also lead to inconsistent consequences. Automatic segmentation and labeling can be introduced to solve these problems. In this paper, we investigate a method to improve the performance of automatic segmentation and labeling system, where Spectral Variation Function(SVF), modification of silence model, and use of energy variations in postprocessing stage are considered. In this paper, SVF is applied in three ways: (1) addition to feature parameters, (2) postprocessing of phoneme boundaries, (3) restricting the Viterbi path so that the resulting phoneme boundaries may be located in frames around SVF peaks. In the postprocessing stage, positions with greatest energy variation during transitional period between silence and other phonemes were used to modify boundaries. In order to evaluate the performance of the system, we used 452 phonetically balanced word(PBW) database for training phoneme models and phonetically balanced sentence(PBS) database for testing. According to our experiments, 83.1% (6.2% improved) and 95.8% (0.9% improved) of phoneme boundaries were within 20ms and 40ms of the manually segmented boundaries, respectively.
PDF

항공우주용 구동장치 개발 동향 (The State of the Art and Application of Actuator in Aerospace)

윤기준;박호열;장기원
- 한국추진공학회지
- /
- 제14권6호
- /
- pp.89-102
- /
- 2010
본 논문에서는 항공우주분야의 구동장치와 여러 산업분야에서 응용되고 있는 미래 지향적 구동기의 기술에 대한 사항과 발전방향에 대해 연구하였다. 특히, 항공기 비행조종면 구동장치의 경우 기존에는 기계식 링키지나 무게 대비 출력이 높은 유 공압 구동기가 많이 사용되었으나 최근에는 대부분의 항공기에서 사용 중인 Fly-By-Wire 시스템과 더 나아가 'More Electric', 'All Electric' 시스템으로의 변화가 이루어지고 있다. 전기식 유압구동기와 전기식 구동장치의 경우 효율이나 안전성 그리고 비용적인 측면에서 우수하기 때문에 근래에 지속적인 개발이 진행되고 있다. 또한, 최근에는 구동기의 무게와 정밀도 그리고 응답속도의 향상을 위해 신소재를 이용한 새로운 분야의 구동기들이 개발되고 있다. 따라서 본 연구를 통해서 차세대 항공우주분야 구동장치와 신개념 구동기들의 세부기술 및 발전방향을 제시하고자 한다.
PDF KSCI

자동 음성분할 및 레이블링 시스템의 구현 (Implementation of the Automatic Segmentation and Labeling System)

성종모;김형순
- 한국음향학회지
- /
- 제16권5호
- /
- pp.50-59
- /
- 1997
본 논문에서는 한국어 음성 데이터베이스 구축을 위하여 자동으로 음소경계를 추출하는 자동 음성분할 및 레이블링 시스템을 구현하였다. 기존의 음성분할 및 레이블링 기술을 근간으로 본 시스템을 구현하였으며, 또한 사용자가 자동분할된 음소경계를 확인하여 그 경계를 쉽게 수정할 수 있도록 한글 모티프 환경에서 그래픽 사용자 인터페이스를 개발하였다. 개발된 시스템은 16kHz로 샘플링된 음성을 대상으로 하고 있으며, 레이블링 단위는 45개의 유사음소와 하나의 묵음으로 구성하였다. 그리고 언어학적 정보의 입력방식으로는 음소표기와 철자표기를 사용하였으며, 패턴매칭 방법으로는 hidden Markov model(HMM)을 이용하였다. 개발된 시스템의 각 음소 모델은 수작업에 의해서 음소단위로 분할한 음성학적으로 균형잡힌 445 단어 데이터베이스를 이용해서 훈련되었다. 그리고 본 시스템의 성능평가를 위해 훈련에 사용되지 않는 문장 데이터베이스에 대해서 자동 음성분할 실험을 수행하였다. 실험결과, 수작업에 의해서 분할된 음소경계위치와의 오차가 20ms 이내인 것이 74.7%였으며, 40ms이내에는 92.8%가 포함되었다.
PDF

원거리 음성명령어 인식시스템 설계 (Performance Evaluation of an Automatic Distance Speech Recognition System)

오유리;윤재삼;박지훈;김민아;김홍국;공동건;명현;방석원
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2007년도 하계종합학술대회 논문집
- /
- pp.303-304
- /
- 2007
In this paper, we implement an automatic distance speech recognition system for voiced-enabled services. We first construct a baseline automatic speech recognition (ASR) system, where acoustic models are trained from speech utterances spoken by using a cross-talking microphone. In order to improve the performance of the baseline ASR using distance speech, the acoustic models are adapted to adjust the spectral characteristics of speech according to different microphones and the environmental mismatches between cross-talking and distance speech. Next we develop a voice activity detection algorithm for distance speech. We compare the performance of the base-line system and the developed ASR system on a task of PBW (Phonetically Balanced Word) 452. As a result it is shown that the developed ASR system provides the average word error rate (WER) reduction of 30.6 % compared to the baseline ASR system.
PDF

청음 음성학적 지식에 기반한 음가분류에 의한 핵심어 검출 시스템 구현 (The Design of Keyword Spotting System based on Auditory Phonetical Knowledge-Based Phonetic Value Classification)

김학진;김순협
- 정보처리학회논문지B
- /
- 제10B권2호
- /
- pp.169-178
- /
- 2003
This study outlines two viewpoints the classification of phone likely unit (PLU) which is the foundation of korean large vocabulary speech recognition, and the effectiveness of Chiljongseong (7 Final Consonants) and Paljogseong (8 Final Consonants) of the korean language. The phone likely classifies the phoneme phonetically according to the location of and method of articulation, and about 50 phone-likely units are utilized in korean speech recognition. In this study auditory phonetical knowledge was applied to the classification of phone likely unit to present 45 phone likely unit. The vowels 'ㅔ, ㅐ'were classified as phone-likely of (ee) ; 'ㅒ, ㅖ' as [ye] ; and 'ㅚ, ㅙ, ㅞ' as [we]. Secondly, the Chiljongseong System of the draft for unified spelling system which is currently in use and the Paljongseonggajokyong of Korean script haerye were illustrated. The question on whether the phonetic value on 'ㄷ' and 'ㅅ' among the phonemes used in the final consonant of the korean fan guage is the same has been argued in the academic world for a long time. In this study, the transition stages of Korean consonants were investigated, and Ciljonseeng and Paljongseonggajokyong were utilized in speech recognition, and its effectiveness was verified. The experiment was divided into isolated word recognition and speech recognition, and in order to conduct the experiment PBW452 was used to test the isolated word recognition. The experiment was conducted on about 50 men and women - divided into 5 groups - and they vocalized 50 words each. As for the continuous speech recognition experiment to be utilized in the materialized stock exchange system, the sentence corpus of 71 stock exchange sentences and speech corpus vocalizing the sentences were collected and used 5 men and women each vocalized a sentence twice. As the result of the experiment, when the Paljongseonggajokyong was used as the consonant, the recognition performance elevated by an average of about 1.45% : and when phone likely unit with Paljongseonggajokyong and auditory phonetic applied simultaneously, was applied, the rate of recognition increased by an average of 1.5% to 2.02%. In the continuous speech recognition experiment, the recognition performance elevated by an average of about 1% to 2% than when the existing 49 or 56 phone likely units were utilized.
https://doi.org/10.3745/KIPSTB.2003.10B.2.169 인용 PDF KSCI

심층신경망 구조에 따른 구개인두부전증 환자 음성 인식 향상 연구 (A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network)

김민석;정재희;정보경;윤기무;배아라;김우일
- 한국음향학회지
- /
- 제38권6호
- /
- pp.703-709
- /
- 2019
본 논문에서는 구개인두부전증(VeloPharyngeal Insufficiency, VPI) 환자의 음성을 효과적으로 인식하기 위해 컨볼루션 신경망 (Convolutional Neural Network, CNN), 장단기 모델(Long Short Term Memory, LSTM) 구조 신경망을 은닉 마르코프 모델(Hidden Markov Model, HMM)과 결합한 하이브리드 구조의 음성 인식 시스템을 구축하고 모델 적응 기법을 적용하여, 기존 Gaussian Mixture Model(GMM-HMM), 완전 연결형 Deep Neural Network(DNN-HMM) 기반의 음성 인식 시스템과 성능을 비교한다. 정상인 화자가 PBW452단어를 발화한 데이터를 이용하여 초기 모델을 학습하고 정상인 화자의 VPI 모의 음성을 이용하여 화자 적응의 사전 모델을 생성한 후에 VPI 환자들의 음성으로 추가 적응 학습을 진행한다. VPI환자의 화자 적응 시에 CNN-HMM 기반 모델에서는 일부층만 적응 학습하고, LSTM-HMM 기반 모델의 경우에는 드롭 아웃 규제기법을 적용하여 성능을 관찰한 결과 기존 완전 연결형 DNN-HMM 인식기보다 3.68 % 향상된 음성 인식 성능을 나타낸다. 이러한 결과는 본 논문에서 제안하는 LSTM-HMM 기반의 하이브리드 음성 인식 기법이 많은 데이터를 확보하기 어려운 VPI 환자 음성에 대해 보다 향상된 인식률의 음성 인식 시스템을 구축하는데 효과적임을 입증한다.
https://doi.org/10.7776/ASK.2019.38.6.703 인용 PDF KSCI

음성명료도 시험에 의한 노인 교육시설의 청취환경 조사 (Investigation of the listening environment of classrooms for elderly people using speech intelligibility tests)

박찬재;김보경;한찬훈
- 한국음향학회지
- /
- 제40권1호
- /
- pp.18-30
- /
- 2021
본 연구의 궁극적인 목적은 청력 비완전자인 노인을 위한 학습공간의 음향기준을 제시하기 위한 것이다. 사전연구로서 본 연구는 현재 운영 중인 노인 교육시설의 청취환경을 조사하고 이를 이용 중인 노인의 음성 인지성능 실태를 조사하기 위해 진행되었다. 이를 위하여, 청주시 소재 2개 노인 교육시설을 대상으로 물리적 음향성능을 측정하고 설문조사 를 실시하였다. 또한, 음절법과 단어법을 이용한 음성명료도 평가를 수행하였다. 노인을 대상으로 한 설문조사 결과 전반적인 청취환경에는 만족하는 것으로 나타났다. 배경소음, 신호대잡음비, 잔향시간, 음성전달지수와 같은 물리적 음향성능 측정 결과 대한민국 일반교실의 음향성능 기준을 만족하고 있음을 알 수 있었다. 그러나 음성명료도 평가 결과 20대 건청인에 비해 노인 그룹의 점수가 20점 이상의 차이로 월등히 낮았으며, 연령대가 높아질수록 점수 또한 낮아지는 것으로 나타났다. 따라서 현재 사용 중인 일반인 대상 교육시설의 음향성능 기준이 노인 교육시설에 적합하지 않음을 알 수 있었다.
https://doi.org/10.7776/ASK.2021.40.1.018 인용 PDF KSCI

검색결과 49건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)