• Title/Summary/Keyword: continuous speech

Search Result 317, Processing Time 0.025 seconds

저작권 보호를 위한 HMM기반의 음악 식별 시스템 (HMM-based Music Identification System for Copyright Protection)

  • 김희동;김도현;김지환
    • 말소리와 음성과학
    • /
    • 제1권1호
    • /
    • pp.63-67
    • /
    • 2009
  • In this paper, in order to protect music copyrights, we propose a music identification system which is scalable to the number of pieces of registered music and robust to signal-level variations of registered music. For its implementation, we define the new concepts of 'music word' and 'music phoneme' as recognition units to construct 'music acoustic models'. Then, with these concepts, we apply the HMM-based framework used in continuous speech recognition to identify the music. Each music file is transformed to a sequence of 39-dimensional vectors. This sequence of vectors is represented as ordered states with Gaussian mixtures. These ordered states are trained using Baum-Welch re-estimation method. Music files with a suspicious copyright are also transformed to a sequence of vectors. Then, the most probable music file is identified using Viterbi algorithm through the music identification network. We implemented a music identification system for 1,000 MP3 music files and tested this system with variations in terms of MP3 bit rate and music speed rate. Our proposed music identification system demonstrates robust performance to signal variations. In addition, scalability of this system is independent of the number of registered music files, since our system is based on HMM method.

  • PDF

CONTINUOUS DIGIT RECOGNITION FOR A REAL-TIME VOICE DIALING SYSTEM USING DISCRETE HIDDEN MARKOV MODELS

  • Choi, S.H.;Hong, H.J.;Lee, S.W.;Kim, H.K.;Oh, K.C.;Kim, K.C.;Lee, H.S.
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
    • /
    • pp.1027-1032
    • /
    • 1994
  • This paper introduces a interword modeling and a Viterbi search method for continuous speech recognition. We also describe a development of a real-time voice dialing system which can recognize around one hundred words and continuous digits in speaker independent mode. For continuous digit recognition, between-word units have been proposed to provide a more precise representation of word junctures. The best path in HMM is found by the Viterbi search algorithm, from which digit sequences are recognized. The simulation results show that a interword modeling using the context-dependent between-word units provide better recognition rates than a pause modeling using the context-independent pause unit. The voice dialing system is implemented on a DSP board with a telephone interface plugged in an IBM PC AT/486.

  • PDF

효율적 한국어 음성 인식을 위한 PTM 음절 모델 (Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR)

  • 김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

잡음 환경에서의 음성인식을 위한 PMC 적응에 관한 연구 (A Study on the PMC Adaptation for Speech Recognition under Noisy Conditions)

  • 김현기
    • 한국산업정보학회논문지
    • /
    • 제7권3호
    • /
    • pp.9-14
    • /
    • 2002
  • 본 논문에서는 잡음 환경에서 음성 인식기의 성능을 향상시키기 위한 방법을 제안한다. 제안한 방법은 기존의 PMC방법으로 상태 당 가지 수가 많은 모델을 만들 때 발생하는 확률 밀도 분포의 변화를 보상하기 위해 상태 수준에서 조합한 파라미터를 재 추정하여 각 상태에서 가지의 확률 분포의 변화를 적응시키는 방법이다. 상태 당 다수의 가지를 가지는 CDHMM은 제안한 PMC 방법과 조합된다. 또한, EM 알고리즘은 가지 평균의 분산을 줄이기 위하여 모델 평균 파라미터를 적응시키는데 사용한다. 그리고 시뮬레이션을 통하여 본 논문에서 제안한 PMC 방법은 기존의PMC 방법보다 더 향상된 성능을 얻을 수 있었다.

  • PDF

한국어 음소 인식을 위한 신경회로망에 관한 연구 (A Study on Neural Networks for Korean Phoneme Recognition)

  • 최영배
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1992년도 학술논문발표회 논문집 제11권 1호
    • /
    • pp.61-65
    • /
    • 1992
  • This paper presents a study on Neural Networks for Phoneme Recognition and performs phoneme recognition using TDNN(Time Delay Neural Network). Also, this paper proposes new training algorithm for speech recognition using neural nets that proper to large scale TDNN. Because phoneme recognition is indispensable for continuous speech recognition, this paper uses TDNN to get accurate recognition result of phoneme. And this paper proposes new training algorithm that can converge TDNN to optimal state regardless of the number of phoneme to be recognized. The result of recognition on three phoneme classes shows recognition rate of 9.1%. And this paper proves that proposed algorithm is a efficient method for high performance and reducing convergence time.

  • PDF

연속음성 인식 및 합성을 위한 운율 경계강도 예측 모델 (Prosody Boundary Index Prediction Model for Continuous Speech Recognition and Speech Synthesis)

  • 강평수
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 학술발표대회 논문집 제17권 1호
    • /
    • pp.99-102
    • /
    • 1998
  • 본 연구에서는 연속음 인식과 합성을 위한 경계강도 예측 모델을 제안한다. 운율 경계 강도는 음성 합성에서는 운율구 사이의 휴지기의 길이 조절로 합성음의 자연도에 기여를 하고 연속음 인식에서는 인식과정에서 나타나는 후보문장의 선별 과정에 특징변수가 되어 인식률 향상에 큰 역할을 한다. 음성학적으로 발화된 문장은 큰 경계 단위로 볼 때 운율구 형태로 이루어졌다고 볼 수 있으며 구의 경계는 문장의 문법적인 특징과 관련을 지을 수 있게 된다. 본 논문에서는 운율 경계 강도 수준을 4로 하고 문법적인 특징으로는 트리구조 방법으로 결정된 오른쪽 가지의 수식의 깊이(rd)와 link grammar방법으로 결정된 음절수(syl), 연결거리(torig)를 bigram 모형과 결합하여 운율적 경계 강도를 예측한다. 예측 모형으로는 다중 회귀 모형과 Marcov 모형을 제안한다. 이들 모형으로 낭독체 200 문장에 대해 실험한 결과 76%로 경계 강도를 예측할 수 있었다.

  • PDF

A Use of Songs for Teaching Pronunciations in Elementary School

  • Hong, Kyung-Suk
    • 대한음성학회지:말소리
    • /
    • 제41호
    • /
    • pp.61-71
    • /
    • 2001
  • How to teach intelligible, communicative pronunciation is a continuous question in the English education. Without good input, we can not expect good output. However, in EFL situation, it is very difficult to input the good English pronunciation, therefore, we have to find out the efficient and effective material for teaching pronunciation. One of the materials is song, because songs contain the linguistic and cultural traits of the language. The purpose of this paper is to clarify the reason why songs are good for teaching pronunciation. Koreans, who are syllable timed language users, have difficulties in English pronunciation of stress, rhythm, consonants cluster, linking or blending in connected speech. The 134 songs from wee sing are analyzed for how these traits show in songs. The result shows that we can acquire the traits easily and naturally through songs. And a lesson plan is offered as an example for teaching songs.

  • PDF

DMS 모델을 이용한 한국어 음성 인식 (Korean Speech Recognition using Dynamic Multisection Model)

  • 안태옥;변용규;김순협
    • 대한전자공학회논문지
    • /
    • 제27권12호
    • /
    • pp.1933-1939
    • /
    • 1990
  • In this paper, we proposed an algorithm which used backtracking method to get time information, and it be modelled DMS (Dynamic Multisection) by feature vectors and time information whic are represented to similiar feature in word patterns spoken during continuous time domain, for Korean Speech recognition by independent speaker using DMS. Each state of model is represented time sequence, and have time information and feature vector. Typical feature vector is determined as the feature vector of each state to minimize the distance between word patterns. DDD Area names are selected as recognition wcabulary and 12th LPC cepstrum coefficients are used as the feature parameter. State of model is made 8 multisection and is used 0.2 as weight for time information. Through the experiment result, recognition rate by DMS model is 94.8%, and it is shown that this is better than recognition rate (89.3%) by MSVQ(Multisection Vector Quantization) method.

  • PDF

계층구조 시간지연 신경망을 이용한 한국어 변이음 인식에 관한 연구 (A Study on Korean Allophone Recognition Using Hierarchical Time-Delay Neural Network)

  • 김수일;임해창
    • 전자공학회논문지B
    • /
    • 제32B권1호
    • /
    • pp.171-179
    • /
    • 1995
  • In many continuous speech recognition systems, phoneme is used as a basic recognition unit However, the coarticulation generated among neighboring phonemes makes difficult to recognize phonemes consistently. This paper proposes allophone as an alternative recognition unit. We have classified each phoneme into three different allophone groups by the location of phoneme within a syllable. For a recognition algorithm, time-delay neural network(TDNN) has been designed. To recognize all Korean allophones, TDNNs are constructed in modular fashion according to acoustic-phonetic features (e.g. voiced/unvoiced, the location of phoneme within a word). Each TDNN is trained independently, and then they are integrated hierarchically into a whole speech recognition system. In this study, we have experimented Korean plosives with phoneme-based recognition system and allophone-based recognition system. Experimental results show that allophone-based recognition is much less affected by the coarticulation.

  • PDF

A Study on Word Vector Models for Representing Korean Semantic Information

  • Yang, Hejung;Lee, Young-In;Lee, Hyun-jung;Cho, Sook Whan;Koo, Myoung-Wan
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.41-47
    • /
    • 2015
  • This paper examines whether the Global Vector model is applicable to Korean data as a universal learning algorithm. The main purpose of this study is to compare the global vector model (GloVe) with the word2vec models such as a continuous bag-of-words (CBOW) model and a skip-gram (SG) model. For this purpose, we conducted an experiment by employing an evaluation corpus consisting of 70 target words and 819 pairs of Korean words for word similarities and analogies, respectively. Results of the word similarity task indicated that the Pearson correlation coefficients of 0.3133 as compared with the human judgement in GloVe, 0.2637 in CBOW and 0.2177 in SG. The word analogy task showed that the overall accuracy rate of 67% in semantic and syntactic relations was obtained in GloVe, 66% in CBOW and 57% in SG.