• 제목/요약/키워드: Word error rate

검색결과 125건 처리시간 0.018초

A New Endpoint Detection Method Based on Chaotic System Features for Digital Isolated Word Recognition System

  • 장한;정길도
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2009년도 정보 및 제어 심포지움 논문집
    • /
    • pp.37-39
    • /
    • 2009
  • In the research of speech recognition, locating the beginning and end of a speech utterance in a background of noise is of great importance. Since the background noise presenting to record will introduce disturbance while we just want to get the stationary parameters to represent the corresponding speech section, in particular, a major source of error in automatic recognition system of isolated words is the inaccurate detection of beginning and ending boundaries of test and reference templates, thus we must find potent method to remove the unnecessary regions of a speech signal. The conventional methods for speech endpoint detection are based on two simple time-domain measurements - short-time energy, and short-time zero-crossing rate, which couldn't guarantee the precise results if in the low signal-to-noise ratio environments. This paper proposes a novel approach that finds the Lyapunov exponent of time-domain waveform. This proposed method has no use for obtaining the frequency-domain parameters for endpoint detection process, e.g. Mel-Scale Features, which have been introduced in other paper. Comparing with the conventional methods based on short-time energy and short-time zero-crossing rate, the novel approach based on time-domain Lyapunov Exponents(LEs) is low complexity and suitable for Digital Isolated Word Recognition System.

  • PDF

CATV 하향 스트림 적용 시스템에서 동기 검출 방안 및 FPGA 설계 (FPGA Design and Sync-Word Detection of CATV Down-Link Stream Transmission System)

  • 정지원
    • 한국정보전자통신기술학회논문지
    • /
    • 제4권4호
    • /
    • pp.286-294
    • /
    • 2011
  • 본 논문은 ITU-T 권고안 J-38 부록 B에 명시된 전송방식의 분석 및 시뮬레이션을 토대로 성능을 분석 하였으며 FPGA 구현시 야기되는 문제점을 나타내고, 해결방안을 제시하였다. 구현상의 문제점으로는 크게 두가지로 분류되는데, 첫째로 다양한 부호화 방식과 변조방식 그리고 심볼 단위 및 비트 단위의 처리로 인해 많은 클럭수를 요구하는데 본 논문에서는 read/write 메모리를 이용하여 필요한 클럭수를 줄였다. 둘째로는 펑쳐링 부호화된 TCM 복호기에 펑처링 패턴에 정확한 동기를 얻지 못하면 프레임 동기 심볼인 UW(Unique sync-Word)를 획득하지 못한다. 따라서 본 논문에서는 펑처링 패턴과 UW 심볼의 동기를 맞추는 알고리즘을 제시하였다. 이러한 알고리즘 분석 및 구현상의 문제점 해결을 토대로 본 논문에서는 ITU-T J38 annex B의 하향 스트림 채널 부호화 시스템을 VHDL 언어를 사용하여 FPGA 칩에 직접 구현하였다.

수학 문장제 해결과 관련한 ChatGPT의 교수학적 활용 방안 모색 (A study on the didactical application of ChatGPT for mathematical word problem solving)

  • 강윤지
    • 한국수학교육학회지시리즈E:수학교육논문집
    • /
    • 제38권1호
    • /
    • pp.49-67
    • /
    • 2024
  • 최근 인공지능 언어 모델의 다양한 활용에 대한 관심이 높아지면서 수학교육에서의 교수학적 활용 방안 모색에 대한 필요성이 강조되고 있다. 인공지능 언어 모델은 자연어 처리가 가능하다는 특징으로 인하여 수학 문장제 해결과 관련된 활용이 기대된다. 인공지능 언어 모델 중 하나인 ChatGPT의 성능을 확인하기 위하여 초등학교 교과서에 제시된 문장제를 해결하도록 지시하였으며 풀이 과정 및 오류를 분석하였다. 분석 결과, 인공지능 언어 모델은 81.08%의 정답률을 나타내었으며 문제 이해 오류, 식 수립 오류, 계산 오류 등이 발생하였다. 이러한 문장제 해결 과정 및 오류 유형의 분석을 바탕으로 인공지능 언어 모델의 교수학적 활용 방안과 관련된 시사점을 제안하였다.

길쌈부호의 부등오류제어 특성 (Unequal Error Control Properties of Convolutional Codes)

  • 이수인;이상곤;문상재
    • 대한전자공학회논문지
    • /
    • 제27권2호
    • /
    • pp.1-8
    • /
    • 1990
  • 부호율 r=b/n인 2진길쌈부호의 부등 비트오류 제어특성을 해석한다. Viterbi 복호되어 나온 b비트로 구성된 정보벡터에서 각 비트가 갖는 오류확률은 서로 다를 수 있다. 따라서 각 비트의 오류제어 성능은 그외 다른 비트의 오류제어 성능과 다를 수 있다. 이러한 부등 비트 오류특성은 DPCM 시스템의 표본 데이타를 전송할 때 전송상에서 신호대 잡음비의 개선을 위해 적용될 수 있다.

  • PDF

Speech Recognition in Car Noise Environments Using Multiple Models Based on a Hybrid Method of Spectral Subtraction and Residual Noise Masking

  • Song, Myung-Gyu;Jung, Hoi-In;Shim, Kab-Jong;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • 제18권3E호
    • /
    • pp.3-8
    • /
    • 1999
  • In speech recognition for real-world applications, the performance degradation due to the mismatch introduced between training and testing environments should be overcome. In this paper, to reduce this mismatch, we provide a hybrid method of spectral subtraction and residual noise masking. We also employ multiple model approach to obtain improved robustness over various noise environments. In this approach, multiple model sets are made according to several noise masking levels and then a model set appropriate for the estimated noise level is selected automatically in recognition phase. According to speaker independent isolated word recognition experiments in car noise environments, the proposed method using model sets with only two masking levels reduced average word error rate by 60% in comparison with spectral subtraction method.

  • PDF

N-gram 기반의 유사도를 이용한 대화체 연속 음성 언어 모델링 (Spontaneous Speech Language Modeling using N-gram based Similarity)

  • 박영희;정민화
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.117-126
    • /
    • 2003
  • This paper presents our language model adaptation for Korean spontaneous speech recognition. Korean spontaneous speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpus. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf/sup */idf similarity. In addition to relevance weighting, we use disfluencies as Predictor to the neighboring words. The best result reduces 9.7% word error rate relatively and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor also.

  • PDF

Three-Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

  • Zgank, Andrej
    • ETRI Journal
    • /
    • 제32권5호
    • /
    • pp.810-818
    • /
    • 2010
  • This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.

정보검색 기법과 동적 보간 계수를 이용한 N-gram 언어모델의 적응 (N- gram Adaptation Using Information Retrieval and Dynamic Interpolation Coefficient)

  • 최준기;오영환
    • 대한음성학회지:말소리
    • /
    • 제56호
    • /
    • pp.207-223
    • /
    • 2005
  • The goal of language model adaptation is to improve the background language model with a relatively small adaptation corpus. This study presents a language model adaptation technique where additional text data for the adaptation do not exist. We propose the information retrieval (IR) technique with N-gram language modeling to collect the adaptation corpus from baseline text data. We also propose to use a dynamic language model interpolation coefficient to combine the background language model and the adapted language model. The interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech data reserved for held-out validation data. This allows the final adapted model to improve the performance of the background model consistently The proposed approach reduces the word error rate by $13.6\%$ relative to baseline 4-gram for two-hour broadcast news speech recognition.

  • PDF

벅아이 코퍼스 오류 수정과 코퍼스 활용을 위한 프랏 스크립트 툴 (Error Correction and Praat Script Tools for the Buckeye Corpus of Conversational Speech)

  • 윤규철
    • 말소리와 음성과학
    • /
    • 제4권1호
    • /
    • pp.29-47
    • /
    • 2012
  • The purpose of this paper is to show how to convert the label files of the Buckeye Corpus of Spontaneous Speech [1] into Praat format and to introduce some of the Praat scripts that will enable linguists to study various aspects of spoken American English present in the corpus. During the conversion process, several types of errors were identified and corrected either manually or automatically by the use of scripts. The Praat script tools that have been developed can help extract from the corpus massive amounts of phonetic measures such as the VOT of plosives, the formants of vowels, word frequency information and speech rates that span several consecutive words. The script tools can extract additional information concerning the phonetic environment of the target words or allophones.

대화체 연속음성 인식을 위한 언어모델 적응 (Language Model Adaptation for Conversational Speech Recognition)

  • 박영희;정민화
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.83-86
    • /
    • 2003
  • This paper presents our style-based language model adaptation for Korean conversational speech recognition. Korean conversational speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpora. For style-based language model adaptation, we report two approaches. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf*idf similarity. In addition to relevance weighting, we use disfluencies as predictor to the neighboring words. The best result reduces 6.5% word error rate absolutely and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor.

  • PDF