• 제목/요약/키워드: 음향적 결합

Search Result 209, Processing Time 0.022 seconds

A Study on the Diphone Recognition of Korean Connected Words and Eojeol Reconstruction (한국어 연결단어의 이음소 인식과 어절 형성에 관한 연구)

  • ;Jeong, Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.46-63
    • /
    • 1995
  • This thesis described an unlimited vocabulary connected speech recognition system using Time Delay Neural Network(TDNN). The recognition unit is the diphone unit which includes the transition section of two phonemes, and the number of diphone unit is 329. The recognition processing of korean connected speech is composed by three part; the feature extraction section of the input speech signal, the diphone recognition processing and post-processing. In the feature extraction section, the extraction of diphone interval in input speech signal is carried and then the feature vectors of 16th filter-bank coefficients are calculated for each frame in the diphone interval. The diphone recognition processing is comprised by the three stage hierachical structure and is carried using 30 Time Delay Neural Networks. particularly, the structure of TDNN is changed so as to increase the recognition rate. The post-processing section, mis-recognized diphone strings are corrected using the probability of phoneme transition and the probability o phoneme confusion and then the eojeols (Korean word or phrase) are formed by combining the recognized diphones.

  • PDF

Speech emotion recognition using attention mechanism-based deep neural networks (주목 메커니즘 기반의 심층신경망을 이용한 음성 감정인식)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.6
    • /
    • pp.407-412
    • /
    • 2017
  • In this paper, we propose a speech emotion recognition method using a deep neural network based on the attention mechanism. The proposed method consists of a combination of CNN (Convolution Neural Networks), GRU (Gated Recurrent Unit), DNN (Deep Neural Networks) and attention mechanism. The spectrogram of the speech signal contains characteristic patterns according to the emotion. Therefore, we modeled characteristic patterns according to the emotion by applying the tuned Gabor filters as convolutional filter of typical CNN. In addition, we applied the attention mechanism with CNN and FC (Fully-Connected) layer to obtain the attention weight by considering context information of extracted features and used it for emotion recognition. To verify the proposed method, we conducted emotion recognition experiments on six emotions. The experimental results show that the proposed method achieves higher performance in speech emotion recognition than the conventional methods.

Recognition for Noisy Speech by a Nonstationary AR HMM with Gain Adaptation Under Unknown Noise (잡음하에서 이득 적응을 가지는 비정상상태 자기회귀 은닉 마코프 모델에 의한 오염된 음성을 위한 인식)

  • 이기용;서창우;이주헌
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.11-18
    • /
    • 2002
  • In this paper, a gain-adapted speech recognition method in noise is developed in the time domain. Noise is assumed to be colored. To cope with the notable nonstationary nature of speech signals such as fricative, glides, liquids, and transition region between phones, the nonstationary autoregressive (NAR) hidden Markov model (HMM) is used. The nonstationary AR process is represented by using polynomial functions with a linear combination of M known basis functions. When only noisy signals are available, the estimation problem of noise inevitably arises. By using multiple Kalman filters, the estimation of noise model and gain contour of speech is performed. Noise estimation of the proposed method can eliminate noise from noisy speech to get an enhanced speech signal. Compared to the conventional ARHMM with noise estimation, our proposed NAR-HMM with noise estimation improves the recognition performance about 2-3%.

Theoretical Study on the Effects of the Withdrawal Weighting on the Performance of Resonator Type SAW Filters (공진기형 SAW 필터에 위드로월 가중법이 미치는 효과에 대한 이론적 연구)

  • 이영진;이승희;노용래
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.47-55
    • /
    • 2002
  • This paper proposes a new improved lumped element equivalent circuit analysis method to analyze withdrawal weighted SAW resonators of irregular electrode configurations, which enables to calculate the frequency response of withdrawal weighted SAW resonators. This method has led to the derivation of Smith equivalent circuit's y-parameters for a single ground electrode and formulated the resonator's admittance by calculating the total current into an IDT assembly. To illustrate the effectiveness of the technique, this method was applied to the design of a simple ladder filter and the change of the filter performance was investigated in relation to the weighting of the series and parallel resonators, respectively. The results shows that the withdrawal weighted resonator ladder filters provide better performance in their bandwidth and transition characteristics than normal ones. This new equivalent circuit analysis method can also serve as a better tool to design and analyze general SAW resonator filters.

A Study on the Korean Broadcasting Speech Recognition (한국어 방송 음성 인식에 관한 연구)

  • 김석동;송도선;이행세
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.53-60
    • /
    • 1999
  • This paper is a study on the korean broadcasting speech recognition. Here we present the methods for the large vocabuary continuous speech recognition. Our main concerns are the language modeling and the search algorithm. The used acoustic model is the uni-phone semi-continuous hidden markov model and the used linguistic model is the N-gram model. The search algorithm consist of three phases in order to utilize all available acoustic and linguistic information. First, we use the forward Viterbi beam search to find word end frames and to estimate related scores. Second, we use the backword Viterbi beam search to find word begin frames and to estimate related scores. Finally, we use A/sup */ search to combine the above two results with the N-grams language model and to get recognition results. Using these methods maximum 96.0% word recognition rate and 99.2% syllable recognition rate are achieved for the speaker-independent continuous speech recognition problem with about 12,000 vocabulary size.

  • PDF

A Study on Recognition Units for Korean Speech Recognition (한국어 분절음 인식을 위한 인식 단위에 대한 연구)

  • ;;Michael W. Macon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.47-52
    • /
    • 2000
  • In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition mit. In this paper, we study on the proper recognition units for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And also, the recognition rate of the case in which the biphone is used as the recognition unit is better than that of the case in which the mono-phoneme is used.

  • PDF

Fatigue Crack Growth Behavior of and Recognition of AE Signals from Composite Patch-Repaired Aluminum Panel (복합재 패치로 보수된 알루미늄 패널의 피로균열 성장거동과 AE신호의 유형인식)

  • Kim, Sung-Jin;Kwon, Oh-Yang;Jang, Yong-Joon
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.27 no.1
    • /
    • pp.48-57
    • /
    • 2007
  • The fatigue crack growth behavior of a cracked and patch-repaired Ah2024-T3 panel has been monitored by acoustic emission(AE). The overall crack growth rate was reduced The crack propagation into the adjacent hole was also retarded by introducing the patch repair. AE signals due to crack growth after the patch repair and those due to debonding of the plate-patch interface were discriminated by usiag the principal component analysis. The former showed high center frequency and low amplitude, whereas the latter showed long rise tine, low frequency and high amplitude. This type of AE signal recognition method could be effective for the prediction of fatigue crack growth behavior in the patch-repaired structures with the aid of AE source location.

A Study on the Neural Networks for Korean Phoneme Recognition (한국어 음소 인식을 위한 신경회로망에 관한 연구)

  • Choi, Young-Bae;Yang, Jin-Woo;Lee, Hyung-Jun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.5-13
    • /
    • 1994
  • This paper presents a study on Neural Networks for Phoneme Recognition and performs the Phoneme Recognition using TDNN (Time Delay Neural Network). Also, this paper proposes training algorithm for speech recognition using neural nets that is a proper to large scale TDNN. Because Phoneme Recognition is indispensable for continuous speech recognition, this paper uses TDNN to get accurate recognition result of phonemes. And this paper proposes new training algorithm that can converge TDNN to an optimal state regardless of the number of phonemes to be recognized. The recognition experiment was performed with new training algorithm for TDNN that combines backpropagation and Cauchy algorithm using stochastic approach. The results of the recognition experiment for three phoneme classes for two speakers show the recognition rates of $98.1\%$. And this paper yielded that the proposed algorithm is an efficient method for higher performance recognition and more reduced convergence time than TDNN.

  • PDF

Minimum Classification Error Training to Improve Discriminability of PCMM-Based Feature Compensation (PCMM 기반 특징 보상 기법에서 변별력 향상을 위한 Minimum Classification Error 훈련의 적용)

  • Kim Wooil;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.58-68
    • /
    • 2005
  • In this paper, we propose a scheme to improve discriminative property in the feature compensation method for robust speech recognition under noisy environments. The estimation of noisy speech model used in existing feature compensation methods do not guarantee the computation of posterior probabilities which discriminate reliably among the Gaussian components. Estimation of Posterior probabilities is a crucial step in determining the discriminative factor of the Gaussian models, which in turn determines the intelligibility of the restored speech signals. The proposed scheme employs minimum classification error (MCE) training for estimating the parameters of the noisy speech model. For applying the MCE training, we propose to identify and determine the 'competing components' that are expected to affect the discriminative ability. The proposed method is applied to feature compensation based on parallel combined mixture model (PCMM). The performance is examined over Aurora 2.0 database and over the speech recorded inside a car during real driving conditions. The experimental results show improved recognition performance in both simulated environments and real-life conditions. The result verifies the effectiveness of the proposed scheme for increasing the performance of robust speech recognition systems.

Development of an SH-SAW Sensor for Detection of DNA (DNA 측정용 SH-SAW 센서 개발)

  • Hur Youngjune;Pak Yukeun Eugene;Roh Yongrae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.160-165
    • /
    • 2005
  • We have developed SH (shear horizontal) surface acoustic wave (SAW) sensors for detection of the immobilization and hybridization of DNA (deoxyribonucleic acid) on the gold coated delay line of transverse SAW devices. The experiments of DNA immobilization and hybridization were performed with 15-mer oligonucleotides (probe and complementary target DNA). The sensor consists of twin SAW delay line oscillators operating at 100 MHz fabricated on $36^{\circ}$ rotated Y-cut $LiTaO_3$ piezoelectric single crystals. The relative change in the frequency of the two oscillators was monitored to detect the hybridization between target DNA and immobilized probe DNA in pH 7.4 PBS (phosphate buffered saline) solution. The measurement results showed a good response of the sensor to the mass loading effects of the DNA immobilization and hybridization with the sensitivity up to $1.55{\cal}ng/{\cal}ml/Hz$.