• Title/Summary/Keyword: acoustic feature

Search Result 238, Processing Time 0.026 seconds

Performance Improvement of Continuous Digits Speech Recognition Using the Transformed Successive State Splitting and Demi-syllable Pair (반음절쌍과 변형된 연쇄 상태 분할을 이용한 연속 숫자 음 인식의 성능 향상)

  • Seo Eun-Kyoung;Choi Gab-Keun;Kim Soon-Hyob;Lee Soo-Jeong
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.1
    • /
    • pp.23-32
    • /
    • 2006
  • This paper describes the optimization of a language model and an acoustic model to improve speech recognition using Korean unit digits. Since the model is composed of a finite state network (FSN) with a disyllable, recognition errors of the language model were reduced by analyzing the grammatical features of Korean unit digits. Acoustic models utilize a demisyllable pair to decrease recognition errors caused by inaccurate division of a phone or monosyllable due to short pronunciation time and articulation. We have used the K-means clustering algorithm with the transformed successive state splitting in the feature level for the efficient modelling of feature of the recognition unit. As a result of experiments, 10.5% recognition rate is raised in the case of the proposed language model. The demi-syllable fair with an acoustic model increased 12.5% recognition rate and 1.5% recognition rate is improved in transformed successive state splitting.

  • PDF

A Study on Automatic Phoneme Segmentation of Continuous Speech Using Acoustic and Phonetic Information (음향 및 음소 정보를 이용한 연속제의 자동 음소 분할에 대한 연구)

  • 박은영;김상훈;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.4-10
    • /
    • 2000
  • The work presented in this paper is about a postprocessor, which improves the performance of automatic speech segmentation system by correcting the phoneme boundary errors. We propose a postprocessor that reduces the range of errors in the auto labeled results that are ready to be used directly as synthesis unit. Starting from a baseline automatic segmentation system, our proposed postprocessor trains the features of hand labeled results using multi-layer perceptron(MLP) algorithm. Then, the auto labeled result combined with MLP postprocessor determines the new phoneme boundary. The details are as following. First, we select the feature sets of speech, based on the acoustic phonetic knowledge. And then we have adopted the MLP as pattern classifier because of its excellent nonlinear discrimination capability. Moreover, it is easy for MLP to reflect fully the various types of acoustic features appearing at the phoneme boundaries within a short time. At the last procedure, an appropriate feature set analyzed about each phonetic event is applied to our proposed postprocessor to compensate the phoneme boundary error. For phonetically rich sentences data, we have achieved 19.9 % improvement for the frame accuracy, comparing with the performance of plain automatic labeling system. Also, we could reduce the absolute error rate about 28.6%.

  • PDF

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

  • Kim, Min Sik;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.65-71
    • /
    • 2020
  • Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

Development of Feature Selection Method for Neural Network AE Signal Pattern Recognition and Its Application to Classification of Defects of Weld and Rotating Components (신경망 AE 신호 형상인식을 위한 특징값 선택법의 개발과 용접부 및 회전체 결함 분류에의 적용 연구)

  • Lee, Kang-Yong;Hwang, In-Bom
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.21 no.1
    • /
    • pp.46-53
    • /
    • 2001
  • The purpose of this paper is to develop a new feature selection method for AE signal classification. The neural network of back propagation algorithm is used. The proposed feature selection method uses the difference between feature coordinates in feature space. This method is compared with the existing methods such as Fisher's criterion, class mean scatter criterion and eigenvector analysis in terms of the recognition rate and the convergence speed, using the signals from the defects in welding zone of austenitic stainless steel and in the metal contact of the rotary compressor. The proposed feature selection methods such as 2-D and 3-D criteria showed better results in the recognition rate than the existing ones.

  • PDF

Data Detection Algorithm Based on GMM in the Acoustic Data Transmission System (음향 데이터 전송 시스템의 강인한 데이터 검출 성능을 위한 Gaussian Mixture Model 기반 연구)

  • Song, Ji-Hyun;Chang, Joon-Hyuk;Kim, Moon-Kee;Kim, Dong-Keon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.4
    • /
    • pp.136-141
    • /
    • 2011
  • In this paper, we propose an approach to improve the data detection performance of the acoustic data transmission system based on the modulated complex lapped transform (MCLT). We first present an effective analysis of the features and the detection method of data in the acoustic data transmission system. And then feature vectors which are applied to the Gaussian mixture model (GMM) are selected from relevant parameters of the previous system for the efficient data detection. For the purpose of evaluating the performance of the proposed algorithm, Bit error rate (BER) of the received data was measured at different environments (music genres (rock, pop, classic, jazz) and different distances (1m∼5m) from the loudspeaker to the microphone in a office room) and yields better results compared with the conventional scheme of the acoustic data transmission system based on the MCLT.

Analyzing the Acoustic Elements and Emotion Recognition from Speech Signal Based on DRNN (음향적 요소분석과 DRNN을 이용한 음성신호의 감성 인식)

  • Sim, Kwee-Bo;Park, Chang-Hyun;Joo, Young-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.45-50
    • /
    • 2003
  • Recently, robots technique has been developed remarkably. Emotion recognition is necessary to make an intimate robot. This paper shows the simulator and simulation result which recognize or classify emotions by learning pitch pattern. Also, because the pitch is not sufficient for recognizing emotion, we added acoustic elements. For that reason, we analyze the relation between emotion and acoustic elements. The simulator is composed of the DRNN(Dynamic Recurrent Neural Network), Feature extraction. DRNN is a learning algorithm for pitch pattern.

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

  • Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
    • The Journal of Korea Robotics Society
    • /
    • v.13 no.1
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

Sparse decision feedback equalization for underwater acoustic channel based on minimum symbol error rate

  • Wang, Zhenzhong;Chen, Fangjiong;Yu, Hua;Shan, Zhilong
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.13 no.1
    • /
    • pp.617-627
    • /
    • 2021
  • Underwater Acoustic Channels (UAC) have inherent sparse characteristics. The traditional adaptive equalization techniques do not utilize this feature to improve the performance. In this paper we consider the Variable Adaptive Subgradient Projection (V-ASPM) method to derive a new sparse equalization algorithm based on the Minimum Symbol Error Rate (MSER) criterion. Compared with the original MSER algorithm, our proposed scheme adds sparse matrix to the iterative formula, which can assign independent step-sizes to the equalizer taps. How to obtain such proper sparse matrix is also analyzed. On this basis, the selection scheme of the sparse matrix is obtained by combining the variable step-sizes and equalizer sparsity measure. We call the new algorithm Sparse-Control Proportional-MSER (SC-PMSER) equalizer. Finally, the proposed SC-PMSER equalizer is embedded into a turbo receiver, which perform turbo decoding, Digital Phase-Locked Loop (DPLL), time-reversal receiving and multi-reception diversity. Simulation and real-field experimental results show that the proposed algorithm has better performance in convergence speed and Bit Error Rate (BER).

Vocabulary Recognition Retrieval Optimized System using MLHF Model (MLHF 모델을 적용한 어휘 인식 탐색 최적화 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.217-223
    • /
    • 2009
  • Vocabulary recognition system of Mobile terminal is executed statistical method for vocabulary recognition and used statistical grammar recognition system using N-gram. If limit arithmetic processing capacity in memory of vocabulary to grow then vocabulary recognition algorithm complicated and need a large scale search space and many processing time on account of impossible to process. This study suggest vocabulary recognition optimize using MLHF System. MLHF separate acoustic search and lexical search system using FLaVoR. Acoustic search feature vector of speech signal extract using HMM, lexical search recognition execution using Levenshtein distance algorithm. System performance as a result of represent vocabulary dependence recognition rate of 98.63%, vocabulary independence recognition rate of 97.91%, represent recognition speed of 1.61 second.

Classification of Seabed Physiognomy Based on Side Scan Sonar Images

  • Sun, Ning;Shim, Tae-Bo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.104-110
    • /
    • 2007
  • As the exploration of the seabed is extended ever further, automated recognition and classification of sonar images become increasingly important. However, most of the methods ignore the directional information and its effect on the image textures produced. To deal with this problem, we apply 2D Gabor filters to extract the features of sonar images. The filters are designed with constrained parameters to reduce the complexity and to improve the calculation efficiency. Meanwhile, at each orientation, the optimal Gabor filter parameters will be selected with the help of bandwidth parameters based on the Fisher criterion. This method can overcome some disadvantages of the traditional approaches of extracting texture features, and improve the recognition rate effectively.