• 제목/요약/키워드: Vocabulary Recognition

검색결과 221건 처리시간 0.028초

음소기반 인식 네트워크에서의 단어 검출률을 이용한 문장거부 (Sentence Rejection using Word Spotting Ratio in the Phoneme-based Recognition Network)

  • 김형태;하진영
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.99-102
    • /
    • 2005
  • Research efforts have been made for out-of-vocabulary word rejection to improve the confidence of speech recognition systems. However, little attention has been paid to non-recognition sentence rejection. According to the appearance of pronunciation correction systems using speech recognition technology, it is needed to reject non-recognition sentences to provide users with more accurate and robust results. In this paper, we introduce standard phoneme based sentence rejection system with no need of special filler models. Instead we used word spotting ratio to determine whether input sentences would be accepted or rejected. Experimental results show that we can achieve comparable performance using only standard phoneme based recognition network in terms of the average of FRR and FAR.

  • PDF

제한적 상태지속시간을 갖는 HMM을 이용한 고립단어 인식 (Isolated Word Recognition Using Hidden Markov Models with Bounded State Duration)

  • 이기희;임인칠
    • 전자공학회논문지B
    • /
    • 제32B권5호
    • /
    • pp.756-764
    • /
    • 1995
  • In this paper, we proposed MLP(MultiLayer Perceptron) based HMM's(Hidden Markov Models) with bounded state duration for isolated word recognition. The minimum and maximum state duration for each state of a HMM are estimated during the training phase and used as parameters of constraining state transition in a recognition phase. The procedure for estimating these parameters and the recognition algorithm using the proposed HMM's are also described. Speaker independent isolated word recognition experiments using a vocabulary of 10 city names and 11 digits indicate that recognition rate can be improved by adjusting the minimum state durations.

  • PDF

한국어 음성 인식을 위한 mono-phone 구성의 기초 연구 (The Basic Study on making mono-phone for Korean Speech Recognition)

  • 황영수;송민석
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2000년도 학술발표대회 논문집 제19권 2호
    • /
    • pp.45-48
    • /
    • 2000
  • In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition unit. In this paper, we study on the basis of making mono-phone for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of :he case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And also, the recognition rate by the number of consonants is a little different.

  • PDF

가변 어휘 인식 모델을 이용한 한국어 방송 뉴스 음성의 인식 (Automatic Recognition of Korean Broadcast News Using Flexible Vocabulary Recognition Models)

  • 유하진
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
    • /
    • pp.70-73
    • /
    • 1998
  • 본 논문에서는 한국어 방송 뉴스 인식 시스템에 관하여 기술한다. 인식 실험 과정에서는 실제로 방송된 음성을 인식하였으나, 인식을 위한 음향 모델은 본 연구소에서 갭라한 고립단어 인식용 가변 어휘 인식모델을 이용하였다. 가변 어휘 인식기는 방송 음성의 연속 문장을 이용하지 않고, 음향학적으로 고르게 분포된 고립 단어를 이용하여 학습되었다. 본 연구에서는 한국어의 특성상 문장이 영어권과 같이 단어 단위가 아닌 어절로 나누어 지는 점을 고려하여, 다양한 형태의 사전 표제어를 대상으로 실험하였다. 또한 탐색과정의 초기단계에 장거리 언어모델을 사용함으로써 인식 오류를 줄일 수 있었다.

  • PDF

감성 어휘에 따른 미디어 파사드 사례의 선호도 조사 분석 (Analysis of Preference Survey for the Media Facade Cases according to the Sensibility Keywords)

  • 김주연
    • 한국실내디자인학회논문집
    • /
    • 제22권2호
    • /
    • pp.58-67
    • /
    • 2013
  • The methods used in this study included investigation of media facades used for landmarks in downtown areas in previous studies, visits to these areas, and recording of media facades. The changes in the colors or the size of buildings were analyzed among the recorded cases and 12 of these cases were selected for further study. The sensibility preference of the media facades was evaluated by sorting the 12 kinds of media facades, and providing them as materials to a group of 60 participants (40 undergraduate and 20 graduate students majoring in architecture and design) consisting of an equal proportion of males and females. This study addressed the following four_stage questions: 1) Five questions of recognition evaluation about media facades and the cognitive evaluation items of emotional vocabulary and color preference in each research case; 2) sensibility preference items regarding the media facade color design; and 3) design evaluation items of the media facades; 4) Video clips and still images were recorded from a middle distance at 7p.m to 11p.m. in central New York, Singapore, Seoul, and Beijing. The participants looked at the changes in colors through the video clips in each case and evaluated their preferences through 23 pairs of emotional vocabulary items using system dynamics. Construction of an emotional vocabulary followed, based on previous studies about media facades and color design. To evaluate the sensibility preference and the perceived representative colors of the media facade, this study suggests new emotional responses that depended on the color emotional vocabulary of light in the LED lighting technical evaluation methodology. A media facade with a moving change of colors, unlike a fixed landscape color design, suggests a new communication method based on architectural factors. New architectural color coordination can be presented for urban landscapes at night. Designs that factor in the pedestrians' emotional vocabulary or preference should take precedence over the use of high luminance and various colors.

음성학적 지식 기반 변이음 모델을 이용한 가변 어휘 단어 인식기 (Variable Vocabulary Word Recognizer using Phonetic Knowledge-based Allophone Model)

  • 김회린;이항섭
    • 한국음향학회지
    • /
    • 제16권2호
    • /
    • pp.31-35
    • /
    • 1997
  • 본 논문에서는 훈련용 음성 데이터와 무관한 임의의 새로운 어휘를 인식해 낼 수 있는 가변 어휘 단어 인식기 개발에 대하여 기술한다. 가변 어휘 단어 인식기를 구현하기 위해서는, 인식 대상이 될 새로운 어휘를 즉시 발음 사전으로 변환시키는 on-line 발음 사전 생성기가 필요하고, 발음 사전 출력을 가지고 각 단어를 모델링할 수 있는 신뢰성 있는 음소 및 변이음 모델이 필요하다. 이와 같은 신뢰성 있는 음소 및 변이음 모델은 생성시키기 위하여 본 연구에서는, 각 음소의 전후 음소들의 음성학적 자질을 고려하여 3 음소열을 집단화(clustering)하여 변이음을 정의하고 이를 당 연구실이 보유하고 있는 POW(Phonetically Optimized Words) 3,848개 단어에 적용하여 1,548개의 변이음 모델을 생성시켰다. 이를 토대로 가변 어휘 단어 인식기를 구현하고 이를 POW 3,848 DB, PBW 445 DB 및 호텔 예약용 244 단어 DB 등에 적용하여 그 성능을 평가하였다. 평가 결과, POW DB에 대해서는 79.6%, PBW DB에 대해서는 445 단어 사전의 경우 79.4%, 100 단어 사전의 경우 88.9%의 성능을 보여 주었고, 호텔 예약 DB에 대해서는 71.4%의 성능을 보여 주었다.

  • PDF

제한된 한국어 연속음성에 나타난 음소인식에 관한 연구 (A Study on the Phoneme Recognition in the Restricted Continuously Spoken Korean)

  • 심성룡;김선일;이행세
    • 전자공학회논문지B
    • /
    • 제32B권12호
    • /
    • pp.1635-1643
    • /
    • 1995
  • This paper proposes an algorithm for machine recognition of phonemes in continuously spoken Korean. The proposed algorithm is a static strategy neural network. The algorithm uses, at the stage of training neurons, features such as the rate of zero crossing, short-term energy, and either PARCOR or auditory-like perceptual linear prediction(PLP) but not both, covering a time of 171ms long. Numerical results show that the algorithm with PLP achieves approximately the frame-based phoneme recognition rate of 99% for small vocabulary recognition experiments. Based on this it is concluded that the proposed algorithm with PLP analysis is effective in phoneme recognition.

  • PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao;Song, Cheng
    • Journal of Information Processing Systems
    • /
    • 제12권3호
    • /
    • pp.410-421
    • /
    • 2016
  • In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.

A Novel Integration Scheme for Audio Visual Speech Recognition

  • Pham, Than Trung;Kim, Jin-Young;Na, Seung-You
    • 한국음향학회지
    • /
    • 제28권8호
    • /
    • pp.832-842
    • /
    • 2009
  • Automatic speech recognition (ASR) has been successfully applied to many real human computer interaction (HCI) applications; however, its performance tends to be significantly decreased under noisy environments. The invention of audio visual speech recognition (AVSR) using an acoustic signal and lip motion has recently attracted more attention due to its noise-robustness characteristic. In this paper, we describe our novel integration scheme for AVSR based on a late integration approach. Firstly, we introduce the robust reliability measurement for audio and visual modalities using model based information and signal based information. The model based sources measure the confusability of vocabulary while the signal is used to estimate the noise level. Secondly, the output probabilities of audio and visual speech recognizers are normalized respectively before applying the final integration step using normalized output space and estimated weights. We evaluate the performance of our proposed method via Korean isolated word recognition system. The experimental results demonstrate the effectiveness and feasibility of our proposed system compared to the conventional systems.

A Study on Image Recommendation System based on Speech Emotion Information

  • Kim, Tae Yeun;Bae, Sang Hyun
    • 통합자연과학논문집
    • /
    • 제11권3호
    • /
    • pp.131-138
    • /
    • 2018
  • In this paper, we have implemented speeches that utilized the emotion information of the user's speech and image matching and recommendation system. To classify the user's emotional information of speech, the emotional information of speech about the user's speech is extracted and classified using the PLP algorithm. After classification, an emotional DB of speech is constructed. Moreover, emotional color and emotional vocabulary through factor analysis are matched to one space in order to classify emotional information of image. And a standardized image recommendation system based on the matching of each keyword with the BM-GA algorithm for the data of the emotional information of speech and emotional information of image according to the more appropriate emotional information of speech of the user. As a result of the performance evaluation, recognition rate of standardized vocabulary in four stages according to speech was 80.48% on average and system user satisfaction was 82.4%. Therefore, it is expected that the classification of images according to the user's speech information will be helpful for the study of emotional exchange between the user and the computer.