Search | Korea Science

Acoustic Model Improvement and Performance Evaluation of the Variable Vocabulary Speech Recognition System (가변 어휘 음성 인식기의 음향모델 개선 및 성능분석)

이승훈;김회린
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.8
- /
- pp.3-8
- /
- 1999
Previous variable vocabulary speech recognition systems with context-independent acoustic modeling, could not represent the effect of neighboring phonemes. To solve this problem, we use allophone-based context-dependent acoustic model. This paper describes the method to improve acoustic model of the system effectively. Acoustic model is improved by using allophone clustering technique that uses entropy as a similarity measure and the optimal allophone model is generated by changing the number of allophones. We evaluate performance of the improved system by using Phonetically Optimized Words(POW) DB and PC commands(PC) DB. As a result, the allophone model composed of six hundreds allophones improved the recognition rate by 13% from the original context independent model m POW test DB.
PDF

English Phoneme Recognition using Segmental-Feature HMM (분절 특징 HMM을 이용한 영어 음소 인식)

Yun, Young-Sun
- Journal of KIISE:Software and Applications
- /
- v.29 no.3
- /
- pp.167-179
- /
- 2002
In this paper, we propose a new acoustic model for characterizing segmental features and an algorithm based upon a general framework of hidden Markov models (HMMs) in order to compensate the weakness of HMM assumptions. The segmental features are represented as a trajectory of observed vector sequences by a polynomial regression function because the single frame feature cannot represent the temporal dynamics of speech signals effectively. To apply the segmental features to pattern classification, we adopted segmental HMM(SHMM) which is known as the effective method to represent the trend of speech signals. SHMM separates observation probability of the given state into extra- and intra-segmental variations that show the long-term and short-term variabilities, respectively. To consider the segmental characteristics in acoustic model, we present segmental-feature HMM(SFHMM) by modifying the SHMM. The SFHMM therefore represents the external- and internal-variation as the observation probability of the trajectory in a given state and trajectory estimation error for the given segment, respectively. We conducted several experiments on the TIMIT database to establish the effectiveness of the proposed method and the characteristics of the segmental features. From the experimental results, we conclude that the proposed method is valuable, if its number of parameters is greater than that of conventional HMM, in the flexible and informative feature representation and the performance improvement.
PDF KSCI

The suppression of noise-induced speech distortions for speech recognition (음성인식을 위한 잡음하의 음성왜곡제거)

Chi, Sang-Mun;Oh, Yung-Hwan
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.12
- /
- pp.93-102
- /
- 1998
In noisy environments, human speech productions are influenced by noises(Lombard effect), and speech signals are contaminated. These distortions dramatically reduce the performance of speech recognition systems. This paper proposes a method of the Lombard effect compensation and noise suppression in order to improve speech recognition performance in noise environments. To estimate the intensity of the Lombard effect which is a nonlinear distortion depending on the ambient noise levels, speakers, and phonetic units, we formulate the measure of the Lombard effect level based on the acoustic speech signal, and the measure is used to compensate the Lombard effect. The distortions of speech under noisy environments are cancelled out as follows. First, spectral subtraction and band-pass filtering are used to cancel out noise. Second, energy nomalization is proposed to cancel out the variation of vocal intensity by the Lombard effect. Finally, the Lombard effect level controls the transform which converts Lombard speech cepstrum to clean speech cepstrum. The proposed method was validated on 50 korean word recognition. Average recognition rates were 82.6%, 95.7%, 97.6% with the proposed method, while 46.3%, 75.5%, 87.4% without any compensation at SNR 0, 10, 20 dB, respectively.
PDF

Pronunciation Variation Modeling for Korean Point-of-Interest Data Usins Prosodic Information (운율 정보를 이용한 한국어 위치 정보 데이터의 발음 모델링)

Kim, Sun-Hee;Park, Jeon-Gue;Jeon, Je-Hun;Na, Min-Soo;Chung, Min-Hwa
- Annual Conference on Human and Language Technology
- /
- 2006.10e
- /
- pp.51-56
- /
- 2006
일반적으로 운율 정보를 음성인식에 이용한 연구들에 있어서는 대부분 운율의 음향적 정보를 이용하는데 반하여, 본 연구에서는 운율어나 음절수와 같은 운율의 구조적 정보가 인식률 향상에 기여함을 보인다. 본 논문은 두 가지 운율 정보, 즉 운율어와 음절수를 이용하여 발음모델링을 할 경우에 음성인식기의 성능을 평가하는 것을 목표로 하는 것으로, 먼저, 운율어를 이용하여 위치 정보데이터의 가능한 모든 발음을 생성하고, 다시 음절 수를 기준으로 발음변이 수를 조절하는 방법을 제시한 다음, 제안한 방법에 의하여 생성한 발음사전을 이용하여 음성인식의 성능을 평가하였다. 실험결과 운율어를 이용하여 발음 사전을 제작한 모든 경우에 베이스라인과 비교하여 성능이 향상됨을 보였는데, 베이스라인의 WER 4.63% 에서 최대 8.4%의 WER 가 감소하였다. 위치 정보 데이터의 음절수에 따라서 발음 변이의 수를 조절한 결과도 전체적으로는 3 음절로 그 수를 제한한 경우, 6 음절이상 단어에서는 4음절로 제한한 경우에 가장 좋은 인식 성능을 얻을 수 있어서, 음절수에 따른 발음변이 수의 조절이 효과적임을 알 수 있었다.
PDF

A Study on PLU (Phone-Likely Unit) for Korean Continuous Speech Recognition (강건한 한국어 연속음성인식을 위한 유사음소단일에 대한 연구)

Seo Jun-Bae;Kim Joo-Gon;Kim Min-Jung;Jung Ho-Youl;Chung Hyun-Yeol
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.37-40
- /
- 2004
본 논문은 한국어 연속음성인식에 효율적인 문맥의존 음향모델 수에 대한 연구로써 유사음소단위 수에 따른 인식 성능을 비교, 평가하였다. 기존에 본연구실에서는 48음소를 기본인식단위로 이용하고 있으나 연속음성인식의 경우 문맥종속모델이 사용되고 문맥종속모델은 변이 음을 고려한 음소가 이미 포함되어 있어 이를 고려하면 기본 음소를 줄이므로서 계산량의 감소와 인식 성능 향상을 기대할 수 있을 것으로 생각된다. 따라서 , 본 논문에서는 기존의 48음소와 이를 39음소로 줄여 인식실험에 사용하여 그 성능을 비교 평가하기로 하였다. 이를 위하여 다양한 태스크의 데이터베이스를 통합하여 부족한 문맥요소들을 확장한 후 인식실험을 수행하였다. 실험결과 변이음의 개수를 줄이면서도 인식 성능저하가 없음을 확인할 수 있었으며 연속 음성의 경우 39음소를 이용한 경우가 $10\%$정도의 향상된 인식성능을 얻을 수 있음을 확인할 수 있었다.
PDF

The Evaluation of Changes Of Acoustic Parameters With Aging by the Multi-Dimensional Acoustic Analysis (다차원음향분석을 이용한 연령변화에 따른 음향지표의 변화)

김형태;김민식;조승호
- Proceedings of the KSLP Conference
- /
- 1996.11a
- /
- pp.77-77
- /
- 1996
성대구조는 연령변화에 따라 조직학적인 변화가 일어나게 된다. 이에 따른 음성의 노화현상을 알아보고자 Multi-Dimensional Voice Program(Model 4305, Kay Elemetrics Corp, USA)을 이용하여 모든 연령층에서 정상적인 목소리와 성대에 병변이 없는 300명(남자141명, 여자159명)을 대상으로 다차원음향분석 지표의 연령변화에 따른 양적변화를 측정하여 연령에 따른 음향분석지표의 정상기준치와 음성지표의 연령별 변화를 밝혀내려 하였다. (중략)
PDF

A Study on Newly-coined Cant creations in compliance with Phonological Variation (음운론적 변이에 의한 신조 상말의 생성에 대하여)

Park, Cheol-Ju
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.87-90
- /
- 2007
We encounter new cant in these days. The cant is classified as 'variation cant' which is used in the communication language. Therefore, this Study will focus on the aspects and the actual conditions of the cant in communication language.
PDF

음향 용어 사용의 문제점

차일환
- Proceedings of the KSLP Conference
- /
- 2003.11a
- /
- pp.187-189
- /
- 2003
제목 음도 고정시 강도변화 <음도> Vocal Pitch 정상성인의 비음도 저항(Impedance)$\rightarrow$임피던스, Jitter : 주파수의 섭동정도 cycle to cycle frequency perturbation, 주기적 펄스파형의 폭, 주파수, 위상이 잡음 등에 의해 정상위치를 약간 벗어나는 현상 또는 변도량을 말한다. Shimmer : 진폭의 섭동정도 (중략)
PDF

Distinguishing features and variability of intonation patterns in Korean phonological phrases: The effects of syllable count and segmental content (한국어 음운구 억양 유형의 변별적 특성과 변이 조건에 대한 연구: 음절 수와 분절음 종류의 영향을 중심으로)

Oh, Jeahyuk
- Phonetics and Speech Sciences
- /
- v.14 no.3
- /
- pp.27-40
- /
- 2022
This study identifies distinguishing features and variability of intonation patterns in Korean phonological phrases. Syllable count and segmental content, which are phonological conditions, of the intonation of phonological phrases were examined. Based on the four syllables, the intonation of a phonological phrase can be set to LHLH as the basic form, and syllable count acts as a condition for making a variation. The "3 syllables or less condition" changes the intonation from a curved line to a straight line. Variation occurs in pitch bandwidth and fluctuation according to segmental content. The first segment affects the phonological phrase formation bandwidth, and the following segment affects the pitch fluctuation. If the first segment has [+aspirate], [+tense], [+continuant], the intonation is formed in the high band, otherwise, it is formed in the low band. If the second or after segment in the intonation realized in the high band has [-aspirate], [-tense], [-continuant], the pitch is lowered to the lowest level of the low bandwidth. In the intonation realized in the low band, [+aspirate], [+tense], [+continuant] is blocked by the second descent of LHLH.
https://doi.org/10.13064/KSSS.2022.14.3.027 인용 PDF KSCI

Spectrum Leakage Reduction using Time Scaling Window (시간축 스케일링 윈도우를 이용한 스펙트럼 누설 감소)

LEE HeeWon;NA DuckSu;BAE MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.107-110
- /
- 2000
음성 신호는 시간에 따라 변하지만 일정 구간에서는 특성이 변하지 않는다고 가정하여 윈도우를 취해 단구간 분석을 한다. 이 때 윈도우의 적용은 필수적이다. 하지만 단구간 분석을 위해서 사용되는 윈도우에 의해 생기는 누설에너지 때문에 음성신호의 스펙트럼 정보가 왜곡되어 버린다. 본 논문에서는 스펙트럼 분석 시 발생되는 누설에너지를 최소화하는 방법을 제안하고자 한다. 음성신호에 고정된 크기의 rectangular Window를 취한 후 처음 샘플과 차이가 가장 작은 샘플을 프레임 크기의 3/4인 지점에서부터 검색하여 최소인 부분까지 시간 축 스케일링을 한 후 기존의 윈도우 크기와 같은 크기로 만든다. interpolation과 decimation을 조합하여 시간 축 스케일링을 한다. 이렇게 윈도우가 적용된 신호를 처리 후 위의 역 과정을 수행한다. 제안한 윈도우의 SNRseg는 rectangular window보다 평균 7.88dB 낮고, kaiser window보다 평균 1.65dB 높았다. 또한 제안한 윈도우의 SD는 rectangular window 보다 평균 $1.73dB^\2$ 낮았다.
PDF

Search Result 248, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)