• Title/Summary/Keyword: Simulation speech

Search Result 302, Processing Time 0.025 seconds

CONTINUOUS DIGIT RECOGNITION FOR A REAL-TIME VOICE DIALING SYSTEM USING DISCRETE HIDDEN MARKOV MODELS

  • Choi, S.H.;Hong, H.J.;Lee, S.W.;Kim, H.K.;Oh, K.C.;Kim, K.C.;Lee, H.S.
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1027-1032
    • /
    • 1994
  • This paper introduces a interword modeling and a Viterbi search method for continuous speech recognition. We also describe a development of a real-time voice dialing system which can recognize around one hundred words and continuous digits in speaker independent mode. For continuous digit recognition, between-word units have been proposed to provide a more precise representation of word junctures. The best path in HMM is found by the Viterbi search algorithm, from which digit sequences are recognized. The simulation results show that a interword modeling using the context-dependent between-word units provide better recognition rates than a pause modeling using the context-independent pause unit. The voice dialing system is implemented on a DSP board with a telephone interface plugged in an IBM PC AT/486.

  • PDF

Creation of a Voice Recognition-Based English Aided Learning Platform

  • Hui Xu
    • Journal of Information Processing Systems
    • /
    • v.20 no.4
    • /
    • pp.491-500
    • /
    • 2024
  • In hopes of resolving the issue of poor quality of information input for teaching spoken English online, the study creates an English teaching assistance model based on a recognition algorithm named dynamic time warping (DTW) and relies on automated voice recognition technology. In hopes of improving the algorithm's efficiency, the study modifies the speech signal's time-domain properties during the pre-processing stage and enhances the algorithm's performance in terms of computational effort and storage space. Finally, a simulation experiment is employed to evaluate the model application's efficacy. The study's revised DTW model, which achieves recognition rates of above 95% for all phonetic symbols and tops the list for cloudy consonant recognition with rates of 98.5%, 98.8%, and 98.7% throughout the three tests, respectively, is demonstrated by the study's findings. The enhanced model for DTW voice recognition also presents higher efficiency and requires less time for training and testing. The DTW model's KS value, which is the highest among the models analyzed in the KS value analysis, is 0.63. Among the comparative models, the model also presents the lowest curve position for both test functions. This shows that the upgraded DTW model features superior voice recognition capabilities, which could significantly improve online English education and lead to better teaching outcomes.

A CELP Coder using the Band-Divided Long Term Prediction (대역 분할 장구간 예측을 이용한 CELP 부호화기)

  • Choi, Young-Soo;Kang, Hong-Goo;Lim, Myoung-Seob;Ahn, Dong-Soon;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.38-45
    • /
    • 1995
  • In this paper a way to improve the performance of the long term prediction is proposed, which adopts the Multi-band Excitation (MBE) method in addition to the Code-Excited Linear Prediction (CELP) method at low bit rates below 4.8 kbps. In the proposed method, the multiband long term prediction is performed on the periodic components which still remain after the long term prediction of the conventional CELP method. At this point, the whole frequency region is divided into subbands whose size is equal to the spacing between the harmonics of the fundamental frequency, and the periodic multiband excitation signals. are represented as the sum of sine waves approximately as large as the spectrum of the excitation signals, so that the actual characteristics of the excitation signals can be better taken into account. To evaluate the performance of the proposed method, computer simulation is performed at 4.8 kbps. The 4.8 kbps DoD CELP and the 4.4 kbps IMBE were chosen as the reference vocoders for the speech quality measure. The result of the perceptual speech quality measure showed that the performance of the proposed method is better than that of the 4.8 kbps DoD CELP vocoder, and similar to that of the 4.4 kbps IMBE vocoder.

  • PDF

An Implementation of Acoustic Echo Canceller Using Adaptive Filtering in Modulated Lapped Transform Domain (Modulated Lapped Transform 영역에서 적응 필터링을 이용한 음향 반향 제거기의 구현)

  • 백수진;박규식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.425-433
    • /
    • 2003
  • Acoustic Echo Canceller (AEC) is a signal processing system for removing unwanted echo signals in teleconference and hands-free communication. Least mean square (LMS) algorithm is one of the adaptive echo cancellation algorithms and it has been most attractive because of its simplicity and robustness. However, the convergence properties of the LMS algorithm degrade with highly correlated input signals such as speech. For this reason, transform-domain adaptive filtering algorithm was introduced to decorrelate the colored input samples by using the orthogonal transform matrix such as DCT, DFT and then LMS adaptive filtering process is applied. In this paper, we propose a MLT domain adaptive echo canceller base on the MLT (Modulated lapped Transform) orthogonal transform matrix. The proposed algorithm achieves high decorrelation efficiency and fast convergence speed via modulated lapped transform of size 2NXN instead of NXN unitary transform such as DCT, DFT, Hadamad and it is applied to the acoustical echo cancellation system. Form the computer simulation with both synthesis and real speech, the proposed MLT domain adaptive echo canceller shows approximately twice faster convergence speed and 20∼30 ㏈ ERLE improvements over the DCT frequency domain acoustic echo cancellation system.

Underwater Target Information Estimation using Proximity Sensor (근접센서를 이용한 수중 표적 정보 추정기법)

  • Kim, JungHoon;Yoon, KyungSik;Seo, IkSu;Lee, KyunKyung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.174-180
    • /
    • 2015
  • In this paper, we propose the passive sonar signal processing technique for estimating target information using proximity sensor. This algorithm is performed by single sensor which is constituted underwater sensor network and has a hierarchical structure. The estimated parameter is the velocity, the depth, the distance and bearing at CPA situations and we can improve the accuracy of signal processing techniques through having a hierarchical structure. We verify the performance of the proposed method by computer simulation and then we check the result that 20% error can be occurred in maximum detectable range. We also confirm that proposed method has the reliability in the actual sea environment through the sea experiment.

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization (확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법)

  • Kim, Dong Kook;Shin, Jong Won;Kwon, Kisoo;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.8
    • /
    • pp.851-858
    • /
    • 2016
  • This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.

Drone Location Tracking with Circular Microphone Array by HMM (HMM에 의한 원형 마이크로폰 어레이 적용 드론 위치 추적)

  • Jeong, HyoungChan;Lim, WonHo;Guo, Junfeng;Ahmad, Isitiaq;Chang, KyungHi
    • Journal of Advanced Navigation Technology
    • /
    • v.24 no.5
    • /
    • pp.393-407
    • /
    • 2020
  • In order to reduce the threat by illegal unmanned aerial vehicles, a tracking system based on sound was implemented. There are three main points to the drone acoustic tracking method. First, it scans the space through variable beam formation to find a sound source and records the sound using a microphone array. Second, it classifies it into a hidden Markov model (HMM) to find out whether the sound source exists or not, and finally, the sound source is In the case of a drone, a sound source recorded and stored as a tracking reference signal based on an adaptive beam pattern is used. The simulation was performed in both the ideal condition without background noise and interference sound and the non-ideal condition with background noise and interference sound, and evaluated the tracking performance of illegal drones. The drone tracking system designed the criteria for determining the presence or absence of a drone according to the improvement of the search distance performance according to the microphone array performance and the degree of sound pattern matching, and reflected in the design of the speech reading circuit.

Source Localization Based on Independent Doublet Array (독립적인 센서쌍 배열에 기반한 음원 위치추정 기법)

  • Choi, Young Doo;Lee, Ho Jin;Yoon, Kyung Sik;Lee, Kyun Kyung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.164-170
    • /
    • 2014
  • A single near-field sounde source bearing and ranging method based on a independent doublet array is proposed. In the common case of bearing estimation method, unform linear array or uniform circular array are used. It is constrained retaining aperture because of array structure to estimate the distance of the sound source. Recent using independent doublet array sound source's bearing and distance esmtimation method is proposed by wide aperture. It is limited to the case doublets are located on a straight line. In this paper, we generalize the case and estimate the localization of a sound source in the various array structure. The proposed algorithm was verified performance through simulation.

Developing an Embedded Method to Recognize Human Pilot Intentions In an Intelligent Cockpit Aids for the Pilot Decision Support System

  • Cha, U-Chang
    • Journal of the Ergonomics Society of Korea
    • /
    • v.17 no.3
    • /
    • pp.23-39
    • /
    • 1998
  • Several recent aircraft accidents occurred due to goal conflicts between human and machine actors. To facilitate the management of the cockpit activities considering these observations. a computational aid. the Agenda Manager (AM) has been developed for use in simulated cockpit environments. It is important to know pilot intentions performing cockpit operations accurately to improve AM performance. Without accurate knowledge of pilot goals or intentions, the information from AM may lead to the wrong direction to the pilot who is using the information. To provide a reliable flight simulation environment regarding goal conflicts. a pilot goal communication method (GCM) was developed to facilitate accurate recognition of pilot goals. Embedded within AM, the GCM was used to recognize pilot goals and to declare them to the AM. Two approaches to the recognition of pilots goals were considered: (1) The use of an Automatic Speech Recognition (ASR) system to recognize overtly or explicitly declared pilot goals. and (2) inference of covertly or implicitly declared pilot goals via the use of an intent inferencing mechanism. The integrated mode of these two methods could overcome the covert goal mis-understanding by use of overt GCM. And also could it overcome workload concern with overt mode by the use of covert GCM. Through simulated flight environment experimentation with real pilot subjects, the proposed GCM has demonstrated its capability to recognize pilot intentions with a certain degree of accuracy and to handle incorrectly declared goals. and was validated in terms of subjective workload and pilot flight control performance. The GCM communicating pilot goals were implemented within the AM to provide a rich environment for the study of human-machine interactions in the supervisory control of complex dynamic systems.

  • PDF

Development of a Lipsync Algorithm Based on Audio-visual Corpus (시청각 코퍼스 기반의 립싱크 알고리듬 개발)

  • 김진영;하영민;이화숙
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.63-69
    • /
    • 2001
  • A corpus-based lip sync algorithm for synthesizing natural face animation is proposed in this paper. To get the lip parameters, some marks were attached some marks to the speaker's face, and the marks' positions were extracted with some Image processing methods. Also, the spoken utterances were labeled with HTK and prosodic information (duration, pitch and intensity) were analyzed. An audio-visual corpus was constructed by combining the speech and image information. The basic unit used in our approach is syllable unit. Based on this Audio-visual corpus, lip information represented by mark's positions was synthesized. That is. the best syllable units are selected from the audio-visual corpus and each visual information of selected syllable units are concatenated. There are two processes to obtain the best units. One is to select the N-best candidates for each syllable. The other is to select the best smooth unit sequences, which is done by Viterbi decoding algorithm. For these process, the two distance proposed between syllable units. They are a phonetic environment distance measure and a prosody distance measure. Computer simulation results showed that our proposed algorithm had good performances. Especially, it was shown that pitch and intensity information is also important as like duration information in lip sync.

  • PDF