• Title/Summary/Keyword: Subword unit

Search Result 8, Processing Time 0.018 seconds

Stochastic Pronunciation Lexicon Modeling for Large Vocabulary Continous Speech Recognition (확률 발음사전을 이용한 대어휘 연속음성인식)

  • Yun, Seong-Jin;Choi, Hwan-Jin;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.49-57
    • /
    • 1997
  • In this paper, we propose the stochastic pronunciation lexicon model for large vocabulary continuous speech recognition system. We can regard stochastic lexicon as HMM. This HMM is a stochastic finite state automata consisting of a Markov chain of subword states and each subword state in the baseform has a probability distribution of subword units. In this method, an acoustic representation of a word can be derived automatically from sample sentence utterances and subword unit models. Additionally, the stochastic lexicon is further optimized to the subword model and recognizer. From the experimental result on 3000 word continuous speech recognition, the proposed method reduces word error rate by 23.6% and sentence error rate by 10% compare to methods based on standard phonetic representations of words.

  • PDF

An Utterance Verification using Vowel String (모음 열을 이용한 발화 검증)

  • 유일수;노용완;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.46-49
    • /
    • 2003
  • The use of confidence measures for word/utterance verification has become art essential component of any speech input application. Confidence measures have applications to a number of problems such as rejection of incorrect hypotheses, speaker adaptation, or adaptive modification of the hypothesis score during search in continuous speech recognition. In this paper, we present a new utterance verification method using vowel string. Using subword HMMs of VCCV unit, we create anti-models which include vowel string in hypothesis words. The experiment results show that the utterance verification rate of the proposed method is about 79.5%.

  • PDF

A Phonetics Based Design of PLU Sets for Korean Speech Recognition (한국어 음성인식을 위한 음성학 기반의 유사음소단위 집합 설계)

  • Hong, Hye-Jin;Kim, Sun-Hee;Chung, Min-Hwa
    • MALSORI
    • /
    • no.65
    • /
    • pp.105-124
    • /
    • 2008
  • This paper presents the effects of different phone-like-unit (PLU) sets in order to propose an optimal PLU set for the performance improvement of Korean automatic speech recognition (ASR) systems. The examination of 9 currently used PLU sets indicates that most of them include a selection of allophones without any sufficient phonetic base. In this paper, a total of 34 PLU sets are designed based on Korean phonetic characteristics arid the effects of each PLU set are evaluated through experiments. The results show that the accuracy rate of each phone is influenced by different phonetic constraint(s) which determine(s) the PLU sets, and that an optimal PLU set can be anticipated through the phonetic analysis of the given speech data.

  • PDF

Korean Head-Tail Tokenization and Part-of-Speech Tagging by using Deep Learning (딥러닝을 이용한 한국어 Head-Tail 토큰화 기법과 품사 태깅)

  • Kim, Jungmin;Kang, Seungshik;Kim, Hyeokman
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.199-208
    • /
    • 2022
  • Korean is an agglutinative language, and one or more morphemes are combined to form a single word. Part-of-speech tagging method separates each morpheme from a word and attaches a part-of-speech tag. In this study, we propose a new Korean part-of-speech tagging method based on the Head-Tail tokenization technique that divides a word into a lexical morpheme part and a grammatical morpheme part without decomposing compound words. In this method, the Head-Tail is divided by the syllable boundary without restoring irregular deformation or abbreviated syllables. Korean part-of-speech tagger was implemented using the Head-Tail tokenization and deep learning technique. In order to solve the problem that a large number of complex tags are generated due to the segmented tags and the tagging accuracy is low, we reduced the number of tags to a complex tag composed of large classification tags, and as a result, we improved the tagging accuracy. The performance of the Head-Tail part-of-speech tagger was experimented by using BERT, syllable bigram, and subword bigram embedding, and both syllable bigram and subword bigram embedding showed improvement in performance compared to general BERT. Part-of-speech tagging was performed by integrating the Head-Tail tokenization model and the simplified part-of-speech tagging model, achieving 98.99% word unit accuracy and 99.08% token unit accuracy. As a result of the experiment, it was found that the performance of part-of-speech tagging improved when the maximum token length was limited to twice the number of words.

A study on the Development of General-Purpose Multimedia Processor Architecture (범용 멀티미디어 프로세서 구조 개발에 관한 연구)

  • 오명훈;박성모
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1149-1152
    • /
    • 1998
  • 멀티미디어 데이터를 아날로그 방식보다는 디지털 방식으로 처리하게 되면 여러 면에서 이득을 볼 수 있다. 멀티미디어 데이터를 디지털 방식으로 처리하는 방법 중 범용프로세서에서 멀티미디어 명령어에 의해 처리하게 되면 flexibility를 증가시키며 효율적으로 프로그램할 수 있다. 본 논문에서는 범용 프로세서 안에서 멀티미디어 데이터를 효율적으로 처리할 수 있는 명령어 집합 구조와 이를 수행할 수 있는 프로세서의 구조를 제안하고 이를 HDL(Hardware Description Language)로 동작레벨에서 기술하고 시뮬레이션 하였다. 제안된 멀티미디어 명령어는 특성에 따라 8개의 그룹에 총 55개의 명령어로 구성되며 64비트 데이터 안에서 각각 8비트의 8바이트, 16비트의 4하프워드, 32비트의 2워드의 부워드(subword) 데이터들을 병렬 처리한다. 모델링된 프로세서는 오픈아키텍쳐(Open Architecture)인 SPARC V.9 의 정수연산장치(Integer Unit)에 기반을 두었으며 하바드 구조를 지닌 5단 파이프라인 RISC 형태이다.

  • PDF

A Study on the Rejection Algorithm Using Generic Word Model Based on Diphone Subword Unit (다이폰 기반의 Generic Word Model을 이용한 거절 알고리즘)

  • Chung, Ik-Joo;Chung, Hoon
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.15-25
    • /
    • 2003
  • In this paper, we propose an algorithm on OOV(Out-of-Vocabulary) rejection based on two-stage method. In the first stage, the algorithm rejects OOVs using generic word model, and then in the second stage, for further reduction of false acceptance, it rejects words which have low similarity to the candidate by measuring the distance between HMM models. For the experiment, we choose 20 in-vocabulary words out of PBW445 DB distributed by ETRI. In case that the first stage is processed only, the false acceptance is 3% with 100% correct acceptance, and in case both stages are processed, the false acceptance is reduced to 1% with 100% correct acceptance.

  • PDF

Subword-based Lip Reading Using State-tied HMM (상태공유 HMM을 이용한 서브워드 단위 기반 립리딩)

  • Kim, Jin-Young;Shin, Do-Sung
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.123-132
    • /
    • 2001
  • In recent years research on HCI technology has been very active and speech recognition is being used as its typical method. Its recognition, however, is deteriorated with the increase of surrounding noise. To solve this problem, studies concerning the multimodal HCI are being briskly made. This paper describes automated lipreading for bimodal speech recognition on the basis of image- and speech information. It employs audio-visual DB containing 1,074 words from 70 voice and tri-viseme as a recognition unit, and state tied HMM as a recognition model. Performance of automated recognition of 22 to 1,000 words are evaluated to achieve word recognition of 60.5% in terms of 22word recognizer.

  • PDF

A Study on the Instruction Set Architecture of Multimedia Extension Processor (멀티미디어 확장 프로세서의 명령어 집합 구조에 관한 연구)

  • O, Myeong-Hun;Lee, Dong-Ik;Park, Seong-Mo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.6
    • /
    • pp.420-435
    • /
    • 2001
  • As multimedia technology has rapidly grown recently, many researches to process multimedia data efficiently using general-purpose processors have been studied. In this paper, we proposed multimedia instructions which can process multimedia data effectively, and suggested a processor architecture for those instructions. The processor was described with Verilog-HDL in the behavioral level and simulated with CADENCE$^{TM}$ tool. Proposed multimedia instructions are total 48 instructions which can be classified into 7 groups. Multimedia data have 64-bit format and are processed as parallel subwords of 8-bit 8 bytes, 16-bit 4 half words or 32-bit 2 words. Modeled processor is developed based on the Integer Unit of SPARC V.9. It has five-stage pipeline RISC architecture with Harvard principle.e.

  • PDF