• Title/Summary/Keyword: 음향음성학

Search Result 749, Processing Time 0.025 seconds

Korean Word Segmentation and Compound-noun Decomposition Using Markov Chain and Syllable N-gram (마코프 체인 밀 음절 N-그램을 이용한 한국어 띄어쓰기 및 복합명사 분리)

  • 권오욱
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.274-284
    • /
    • 2002
  • Word segmentation errors occurring in text preprocessing often insert incorrect words into recognition vocabulary and cause poor language models for Korean large vocabulary continuous speech recognition. We propose an automatic word segmentation algorithm using Markov chains and syllable-based n-gram language models in order to correct word segmentation error in teat corpora. We assume that a sentence is generated from a Markov chain. Spaces and non-space characters are generated on self-transitions and other transitions of the Markov chain, respectively Then word segmentation of the sentence is obtained by finding the maximum likelihood path using syllable n-gram scores. In experimental results, the algorithm showed 91.58% word accuracy and 96.69% syllable accuracy for word segmentation of 254 sentence newspaper columns without any spaces. The algorithm improved the word accuracy from 91.00% to 96.27% for word segmentation correction at line breaks and yielded the decomposition accuracy of 96.22% for compound-noun decomposition.

A Study on Keyword Spotting System Using Pseudo N-gram Language Model (의사 N-gram 언어모델을 이용한 핵심어 검출 시스템에 관한 연구)

  • 이여송;김주곤;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.242-247
    • /
    • 2004
  • Conventional keyword spotting systems use the connected word recognition network consisted by keyword models and filler models in keyword spotting. This is why the system can not construct the language models of word appearance effectively for detecting keywords in large vocabulary continuous speech recognition system with large text data. In this paper to solve this problem, we propose a keyword spotting system using pseudo N-gram language model for detecting key-words and investigate the performance of the system upon the changes of the frequencies of appearances of both keywords and filler models. As the results, when the Unigram probability of keywords and filler models were set to 0.2, 0.8, the experimental results showed that CA (Correctly Accept for In-Vocabulary) and CR (Correctly Reject for Out-Of-Vocabulary) were 91.1% and 91.7% respectively, which means that our proposed system can get 14% of improved average CA-CR performance than conventional methods in ERR (Error Reduction Rate).

Development of a Lipsync Algorithm Based on Audio-visual Corpus (시청각 코퍼스 기반의 립싱크 알고리듬 개발)

  • 김진영;하영민;이화숙
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.63-69
    • /
    • 2001
  • A corpus-based lip sync algorithm for synthesizing natural face animation is proposed in this paper. To get the lip parameters, some marks were attached some marks to the speaker's face, and the marks' positions were extracted with some Image processing methods. Also, the spoken utterances were labeled with HTK and prosodic information (duration, pitch and intensity) were analyzed. An audio-visual corpus was constructed by combining the speech and image information. The basic unit used in our approach is syllable unit. Based on this Audio-visual corpus, lip information represented by mark's positions was synthesized. That is. the best syllable units are selected from the audio-visual corpus and each visual information of selected syllable units are concatenated. There are two processes to obtain the best units. One is to select the N-best candidates for each syllable. The other is to select the best smooth unit sequences, which is done by Viterbi decoding algorithm. For these process, the two distance proposed between syllable units. They are a phonetic environment distance measure and a prosody distance measure. Computer simulation results showed that our proposed algorithm had good performances. Especially, it was shown that pitch and intensity information is also important as like duration information in lip sync.

  • PDF

An Implementation of Inverse Filter Using SVD for Multi-channel Sound Reproduction (SVD를 이용한 다중 채널상에서의 음재생을 위한 역변환 필터의 구현)

  • 이상권;노경래
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.3-11
    • /
    • 2001
  • This paper describes an implementation of inverse filter using SVD in order to recover the input in multi-channel system. The matrix formulation in SISO system is extended to MIMO system. In time and frequency domain we investigates the inversion of minimum phase system and non-minimum phase system. To execute an effective inversion of non-minimum phase system, SVD is introduced. First of all we computes singular values of system matrix and then investigates the phase property of system. In case of overall system is non-minimum phase, system matrix has one (or more) very small singular value (s). The very small singular value (s) carries information about phase properties of system. Using this property, approximate inverse filter of overall system is founded. The numerical simulation shows potentials in use of the inverse filter.

  • PDF

A Study on the Recognition of Korean Numerals Using Recurrent Neural Predictive HMM (회귀신경망 예측 HMM을 이용한 숫자음 인식에 관한 연구)

  • 김수훈;고시영;허강인
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.12-18
    • /
    • 2001
  • In this paper, we propose the Recurrent Neural Predictive HMM (RNPHMM). The RNPHMM is the hybrid network of the recurrent neural network and HMM. The predictive recurrent neural network trained to predict the future vector based on several last feature vectors, and defined every state of HMM. This method uses the prediction value from the predictive recurrent neural network, which is dynamically changing due to the effects of the previous feature vectors instead of the stable average vectors. The models of the RNPHMM are Elman network prediction HMM and Jordan network prediction HMM. In the experiment, we compared the recognition abilities of the RNPHMM as we increased the state number, prediction order, and number of hidden nodes for the isolated digits. As a result of the experiments, Elman network prediction HMM and Jordan network prediction HMM have good recognition ability as 98.5% for test data, respectively.

  • PDF

An efficient space dividing method for the two-dimensional sound source localization (2차원 상의 음원위치 추정을 위한 효율적인 영역분할방법)

  • Kim, Hwan-Yong;Choi, Hong-Sub
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.5
    • /
    • pp.358-367
    • /
    • 2016
  • SSL (Sound Source Localization) has been applied to several applications such as man-machine interface, video conference system, smart car and so on. But in the process of sound source localization, angle estimation error is occurred mainly due to the non-linear characteristics of the sine inverse function. So an approach was proposed to decrease the effect of this non-linear characteristics, which divides the microphone's covering space into narrow regions. In this paper, we proposed an optimal space dividing way according to the pattern of microphone array. In addition, sound source's 2-dimensional position is estimated in order to evaluate the performance of this dividing method. In the experiment, GCC-PHAT (Generalized Cross Correlation PHAse Transform) method that is known to be robust with noisy environments is adopted and triangular pattern of 3 microphones and rectangular pattern of 4 microphones are tested with 100 speech data respectively. The experimental results show that triangular pattern can't estimate the correct position due to the lower space area resolution, but performance of rectangular pattern is dramatically improved with correct estimation rate of 67 %.

Speech Synthesis for the Korean large Vocabulary Through the Waveform Analysis in Time Domains and Evauation of Synthesized Speech Quality (시간영역에서의 파형분석에 의한 무제한 어휘 합성 및 음절 유형별 규칙합성음 음질평가)

  • Kang, Chan-Hee;Chin, Yong-Ohk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.71-83
    • /
    • 1994
  • This paper deals with the improvement of the synthesized speech quality and naturality in the Korean TTS(Text-to-Speech) system. We had extracted the parameters(table2) such as its amplitude, duration and pitch period in a syllable through the analysis of speech waveforms(table1) in the time domain and synthesized syllables using them. To the frequencies of the Korean pronunciation large vocabulary dictionary we had synthesized speeches selected 229 syllables such as V types are 19, CV types are 80. VC types are 30 and CVC types are 100. According to the 4 Korean syllable types from the data format dictionary(table3) we had tested each 15 syllables with the objective MOS(Mean Opinion Score) evaluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. As the results of experiments the qualities of them are very clear and we can control the prosodic elements such as durations, accents and pitch periods (fig9, 10, 11, 12).

  • PDF

The Analysis and Recognition of Korean Speech Signal using the Phoneme (음소에 의한 한국어 음성의 분석과 인식)

  • Kim, Yeong-Il;Lee, Geon-Gi;Lee, Mun-Su
    • The Journal of the Acoustical Society of Korea
    • /
    • v.6 no.2
    • /
    • pp.38-47
    • /
    • 1987
  • As Korean language can be phonemically classified according to the characteristic and structure of its pronunciation, Korean syllables can be divided into the phonemes such as consonant and vowel. The divided phonemes are analyzed by using the method of partial autocorrelation, and the order of partial autocorelation coefficient is 15. In analysis, it is shown that each characteristic of the same consonants, vowels, and end consonant in syllables in similar. The experiments is carried out by dividing 675 syllables into consonants, vowels, and end consonants. The recognition rate of consonants, vowels, end-consonants, and syllables are $85.0(\%)$, $90.7(\%)$, $85.5(\%)$and $72.1(\%)$ respectively. In conclusion, it is shown that Korean syllables, divided by the phonemes, are analyzed and recognized with minimum data and short processing time. Furthermore, it is shown that Korean syllables, words and sentences are recognized in the same way.

  • PDF

A Study on A Multi-Pulse Linear Predictive Filtering And Likelihood Ratio Test with Adaptive Threshold (멀티 펄스에 의한 선형 예측 필터링과 적응 임계값을 갖는 LRT의 연구)

  • Lee, Ki-Yong;Lee, Joo-Hun;Song, Iick-Ho;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.20-29
    • /
    • 1991
  • A fundamental assumption in conventional linear predictive coding (LPC) analysis procedure is that the input to an all-pole vocal tract filter is white process. In the case of periodic inputs, however, a pitch bias error is introduced into the conventional LP coefficient. Multi-pulse (MP) LP analysis can reduce this bias, provided that an estimate of the excitation is available. Since the prediction error of conventional LP analysis can be modeled as the sum of an MP excitation sequence and a random noise sequence, we can view extracting MP sequences from the prediction error as a classical detection and estimation problem. In this paper, we propose an algorithm in which the locations and amplitudes of the MP sequences are first obtained by applying a likelihood ratio test (LRT) to the prediction error, and LP coefficients free of pitch bias are then obtained from the MP sequences. To verify the performance enhancement, we iterate the above procedure with adaptive threshold at each step.

  • PDF

Vocal separation method using weighted β-order minimum mean square error estimation based on kernel back-fitting (커널 백피팅 알고리즘 기반의 가중 β-지수승 최소평균제곱오차 추정방식을 적용한 보컬음 분리 기법)

  • Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.49-54
    • /
    • 2016
  • In this paper, we propose a vocal separation method using weighted ${\beta}$-order minimum mean wquare error estimation (WbE) based on kernel back-fitting algorithm. In spoken speech enhancement, it is well-known that the WbE outperforms the existing Bayesian estimators such as the minimum mean square error (MMSE) of the short-time spectral amplitude (STSA) and the MMSE of the logarithm of the STSA (LSA), in terms of both objective and subjective measures. In the proposed method, WbE is applied to a basic iterative kernel back-fitting algorithm for improving the vocal separation performance from monaural music signal. The experimental results show that the proposed method achieves better separation performance than other existing methods.