• Title/Summary/Keyword: Speech recognition model

Search Result 618, Processing Time 0.022 seconds

Context Recognition Using Environmental Sound for Client Monitoring System (피보호자 모니터링 시스템을 위한 환경음 기반 상황 인식)

  • Ji, Seung-Eun;Jo, Jun-Yeong;Lee, Chung-Keun;Oh, Siwon;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.2
    • /
    • pp.343-350
    • /
    • 2015
  • This paper presents a context recognition method using environmental sound signals, which is applied to a mobile-based client monitoring system. Seven acoustic contexts are defined and the corresponding environmental sound signals are obtained for the experiments. To evaluate the performance of the context recognition, MFCC and LPCC method are employed as feature extraction, and statistical pattern recognition method are used employing GMM and HMM as acoustic models, The experimental results show that LPCC and HMM are more effective at improving context recognition accuracy compared to MFCC and GMM respectively. The recognition system using LPCC and HMM obtains 96.03% in recognition accuracy. These results demonstrate that LPCC is effective to represent environmental sounds which contain more various frequency components compared to human speech. They also prove that HMM is more effective to model the time-varying environmental sounds compared to GMM.

Improvement of Confidence Measure Performance using Background Model Set Algorithm (BMS 알고리즘을 이용한 거절기능 성능 향상)

  • Kim ByoungDon;Lee KyongRok;Kim JinYoung;Choi SeungHo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.79-82
    • /
    • 2003
  • In this paper, we proposed Backgorund Model Set algorithm for the speaker verification to improve the shortcoming of calculating process in conventional confidence measure(CM). CM is to display relative likelihood between recognized models and unrecognized models. Unrecognized models is known as antiphone models. Calculate probability and standard deviation using all phonemes at process that compose antiphone model. At this process, antiphone CM brought bad result. Also, recognition time increases. In order problem, we studied about method to reconstitute average and standard deviation taking BMS algorithm using antiphoneme that near phoneme of CM calculation.

  • PDF

Token-Based Classification and Dataset Construction for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 기반의 분류 및 데이터셋)

  • Sungmin Ko;Youhyun Shin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.181-188
    • /
    • 2024
  • Traditional profanity detection methods have limitations in identifying intentionally altered profanities. This paper introduces a new method based on Named Entity Recognition, a subfield of Natural Language Processing. We developed a profanity detection technique using sequence labeling, for which we constructed a dataset by labeling some profanities in Korean malicious comments and conducted experiments. Additionally, to enhance the model's performance, we augmented the dataset by labeling parts of a Korean hate speech dataset using one of the large language models, ChatGPT, and conducted training. During this process, we confirmed that filtering the dataset created by the large language model by humans alone could improve performance. This suggests that human oversight is still necessary in the dataset augmentation process.

A Study on the Algorithm Development for Speech Recognition of Korean and Japanese (한국어와 일본어의 음성 인식을 위한 알고리즘 개발에 관한 연구)

  • Lee, Sung-Hwa;Kim, Hyung-Lae
    • Journal of IKEEE
    • /
    • v.2 no.1 s.2
    • /
    • pp.61-67
    • /
    • 1998
  • In this thesis, experiment have performed with the speaker recognition using multilayer feedforward neural network(MFNN) model using Korean and Japanese digits . The 5 adult males and 5 adult females pronounciate form 0 to 9 digits of Korean, Japanese 7 times. And then, they are extracted characteristics coefficient through Pitch deletion algorithm, LPC analysis, and LPC Cepstral analysis to generate input pattern of MFNN. 5 times among them are used to train a neural network, and 2 times is used to measure the performance of neural network. Both Korean and Japanese, Pitch coefficients is about 4%t more enhanced than LPC or LPC Cepstral coefficients.

  • PDF

Ship s Maneuvering and Winch Control System with Voice Instruction Based Learning (음성지시에 의한 선박 조종 및 윈치 제어 시스템)

  • Seo, Ki-Yeol;Park, Gyei-Kark
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.6
    • /
    • pp.517-523
    • /
    • 2002
  • In this paper, we propose system that apply VIBL method to add speech recognition to LIBL method based on human s studying method to use natural language to steering system of ship, MERCS and winch appliances and use VIBL method to alternate process that linguistic instruction such as officer s steering instruction is achieved via ableman and control steering gear, MERCS and winch appliances. By specific method of study, ableman s suitable steering manufacturing model embodies intelligent steering gear controlling system that embody and language direction base studying method to present proper meaning element and evaluation rule to steering system of ship apply and respond more efficiently on voice instruction of commander using fuzzy inference rule. Also we embody system that recognize voice direction of commander and control MERCS and winch appliances. We embodied steering manufacturing model based on ableman s experience and presented rudder angle for intelligent steering system, compass bearing arrival time, evaluation rule to propose meaning element of stationary state and correct steerman manufacturing model rule using technique to recognize voice instruction of commander and change to text and fuzzy inference. Also we apply VIBL method to speech recognition ship control simulator and confirmed the effectiveness.

A Study On Intelligent Robot Control Based On Voice Recognition For Smart FA (스마트 FA를 위한 음성인식 지능로봇제어에 관한 연구)

  • Sim, H.S.;Kim, M.S.;Choi, M.H.;Bae, H.Y.;Kim, H.J.;Kim, D.B.;Han, S.H.
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.2
    • /
    • pp.87-93
    • /
    • 2018
  • This Study Propose A New Approach To Impliment A Intelligent Robot Control Based on Voice Recognition For Smart Factory Automation Since human usually communicate each other by voices, it is very convenient if voice is used to command humanoid robots or the other type robot system. A lot of researches has been performed about voice recognition systems for this purpose. Hidden Markov Model is a robust statistical methodology for efficient voice recognition in noise environments. It has being tested in a wide range of applications. A prediction approach traditionally applied for the text compression and coding, Prediction by Partial Matching which is a finite-context statistical modeling technique and can predict the next characters based on the context, has shown a great potential in developing novel solutions to several language modeling problems in speech recognition. It was illustrated the reliability of voice recognition by experiments for humanoid robot with 26 joints as the purpose of application to the manufacturing process.

Voice Activity Detection Method Using Psycho-Acoustic Model Based on Speech Energy Maximization in Noisy Environments (잡음 환경에서 심리음향모델 기반 음성 에너지 최대화를 이용한 음성 검출 방법)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.447-453
    • /
    • 2009
  • This paper introduces the method for detect voices and exact end point at low SNR by maximizing voice energy. Conventional VAD (Voice Activity Detection) algorithm estimates noise level so it tends to detect the end point inaccurately. Moreover, because it uses relatively long analysis range for reflecting temporal change of noise, computing load too high for application. In this paper, the SEM-VAD (Speech Energy Maximization-Voice Activity Detection) method which uses psycho-acoustical bark scale filter banks to maximize voice energy within frames is introduced. Stable threshold values are obtained at various noise environments (SNR 15 dB, 10 dB, 5 dB, 0 dB). At the test for voice detection in car noisy environment, PHR (Pause Hit Rate) was 100%accurate at every noise environment, and FAR (False Alarm Rate) shows 0% at SNR15 dB and 10 dB, 5.6% at SNR5 dB and 9.5% at SNR0 dB.

Continuous Speech Recognition based on Parmetric Trajectory Segmental HMM (모수적 궤적 기반의 분절 HMM을 이용한 연속 음성 인식)

  • 윤영선;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.35-44
    • /
    • 2000
  • In this paper, we propose a new trajectory model for characterizing segmental features and their interaction based upon a general framework of hidden Markov models. Each segment, a sequence of vectors, is represented by a trajectory of observed sequences. This trajectory is obtained by applying a new design matrix which includes transitional information on contiguous frames, and is characterized as a polynomial regression function. To apply the trajectory to the segmental HMM, the frame features are replaced with the trajectory of a given segment. We also propose the likelihood of a given segment and the estimation of trajectory parameters. The obervation probability of a given segment is represented as the relation between the segment likelihood and the estimation error of the trajectories. The estimation error of a trajectory is considered as the weight of the likelihood of a given segment in a state. This weight represents the probability of how well the corresponding trajectory characterize the segment. The proposed model can be regarded as a generalization of a conventional HMM and a parametric trajectory model. The experimental results are reported on the TIMIT corpus and performance is show to improve significantly over that of the conventional HMM.

  • PDF

A Parallel Speech Recognition System based on Hidden Markov Model (은닉 마코프 모델 기반 병렬음성인식 시스템)

  • Jeong, Sang-Hwa;Park, Min-Uk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.12
    • /
    • pp.951-959
    • /
    • 2000
  • 본 논문의 병렬음성인식 모델은 연속 은닉 마코프 모델(HMM; hidden Markov model)에 기반한 병렬 음소인식모듈과 계층구조의 지식베이스에 기반한 병렬 문장인식모듈로 구성된다. 병렬 음소인식 모듈은 수천개의 HMM을 병렬 프로세서에 분산시킨 수, 할당된 HMM에 대한 출력확률 계산과 Viterbi 알고리즘을 담당한다. 지식베이스 기반 병렬 문장인식모듈은 음소모듈에서 공급되는 음소열과 지안하는 병렬 음성인식 알고리즘은 분산메모리 MIMD 구조의 다중 트랜스퓨터와 Parsytec CC 상에 구현되었다. 실험결과, 병렬 음소인식모듈을 통한 실행시간 향상과 병렬 문장인식모듈을 통한 인식률 향상을 얻을 수 있었으며 병렬 음성인식 시스템의 실시간 구현 가능성을 확인하였다.

  • PDF

Language Modeling Approaches to Information Retrieval

  • Banerjee, Protima;Han, Hyo-Il
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.143-164
    • /
    • 2009
  • This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model. In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends.