• Title/Summary/Keyword: speaker dependent system

Search Result 76, Processing Time 0.026 seconds

Performance Improvement of Speaker Recognition System Using Genetic Algorithm (유전자 알고리즘을 이용한 화자인식 시스템 성능 향상)

  • 문인섭;김종교
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.8
    • /
    • pp.63-67
    • /
    • 2000
  • This paper deals with text-prompt speaker recognition based on dynamic time warping (DTW). The Genetic Algorithm was applied to the creation of reference patterns for suitable reflection of the speaker characteristics, one of the most important determinants in the fields of speaker recognition. In order to overcome the weakness of text-dependent and text-independent speaker recognition, the text-prompt type was suggested. Performed speaker identification and verification in close and open set respectively, hence the Genetic algorithm-based reference patterns had been proven to have better performance in both recognition rate and speed than that of conventional reference patterns.

  • PDF

A Study on Phoneme Recognition using Neural Networks and Fuzzy logic (신경망과 퍼지논리를 이용한 음소인식에 관한 연구)

  • Han, Jung-Hyun;Choi, Doo-Il
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2265-2267
    • /
    • 1998
  • This paper deals with study of Fast Speaker Adaptation Type Speech Recognition, and to analyze speech signal efficiently in time domain and time-frequency domain, utilizes SCONN[1] with Speech Signal Process suffices for Fast Speaker Adaptation Type Speech Recognition, and examined Speech Recognition to investigate adaptation of system, which has speech data input after speaker dependent recognition test.

  • PDF

A Study on Modified Clustering Algorithm for Text-Dependent Speaker Verification System (문장종속 화자확인 시스템을 위한 개선된 군집화 알고리즘에 관한 연구)

  • 강철호;정희석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.7
    • /
    • pp.548-553
    • /
    • 2004
  • In this paper we propose modified LBG algorithm to minimize quantization errors. When we apply conventional LBG algorithm for speaker verification system, problems that result from small amount of training data can be generated. That is, quantization error comes from fixed-sized codebook without any consideration for speaker characteristics and splitting vector in the wrong direction worsen performance of speaker verification system. So, we propose modified clustering method that has variable sized codebook according to speaker characteristics and makes right splitting direction by finding the farthest member away from mean and then find another member from the member. Simulation results show effectiveness of the proposed algorithm.

Text-dependent Speaker Recognition System Using DTW & VQ (VQ와 DTW를 이용한 문장 의존형 화자인식 시스템)

  • Jung JongSoon;Oh SeYoung;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.97-103
    • /
    • 2001
  • The speaker recognition method using DTW algorithm has the problem that is reducing the performance of the speaker recognition system as the time variation. So there are many proposed algorithms to solve these problems. This paper proposes the new method If make the reference pattern that is acceptable to intra-speaker variation by reference pattern normalization. And to avoid reducing performance of speaker recognition system, we use the modified reference pattern to recognize the system user. The used methods in this paper are VQ and DTW. As the result of simulation we can obtain the $97.5\%$ of recognition accuracy rate.

  • PDF

Speaker Adaptation for Voice Dialing (음성 다이얼링을 위한 화자적응)

  • ;Chin-Hui Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.455-461
    • /
    • 2002
  • This paper presents a method that improves the performance of the personal voice dialling system in which speaker independent phoneme HMM's are used. Since the speaker independent phoneme HMM based voice dialing system uses only the phone transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the system which uses the speaker dependent models due to the phone recognition errors generated when the speaker independent models are used. In order to solve this problem, a new method that jointly estimates transformation vectors for the speaker adaptation and transcriptions from training utterances is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker-independent phone models. Experimental result shows that the proposed method is superior to the conventional method which used transcriptions only.

A Study on the Voice Conversion with HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 음색변환에 관한 연구)

  • Kim, Il-Hwan;Bae, Keun-Sung
    • MALSORI
    • /
    • v.68
    • /
    • pp.65-74
    • /
    • 2008
  • A statistical parametric speech synthesis system based on the hidden Markov models (HMMs) has grown in popularity over the last few years, because it needs less memory and low computation complexity and is suitable for the embedded system in comparison with a corpus-based unit concatenation text-to-speech (TTS) system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. In this paper, we present experimental results of voice characteristics conversion using the HMM-based Korean speech synthesis system. The results have shown that conversion of voice characteristics could be achieved using a few sentences uttered by a target speaker. Synthetic speech generated from adapted models with only ten sentences was very close to that from the speaker dependent models trained using 646 sentences.

  • PDF

Review And Challenges In Speech Recognition (ICCAS 2005)

  • Ahmed, M.Masroor;Ahmed, Abdul Manan Bin
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1705-1709
    • /
    • 2005
  • This paper covers review and challenges in the area of speech recognition by taking into account different classes of recognition mode. The recognition mode can be either speaker independent or speaker dependant. Size of the vocabulary and the input mode are two crucial factors for a speech recognizer. The input mode refers to continuous or isolated speech recognition system and the vocabulary size can be small less than hundred words or large less than few thousands words. This varies according to system design and objectives.[2]. The organization of the paper is: first it covers various fundamental methods of speech recognition, then it takes into account various deficiencies in the existing systems and finally it discloses the various probable application areas.

  • PDF

A New Type Speaker Utilizing a Magneto-rheological Fluid Diaphragm (자기유변유체 다이어프램을 이용한 새로운 타입의 스피커)

  • Park, Jhin Ha;Yoon, Ji Young;Kim, Seon Hye;Lee, Tae Hoon;Lee, Soo Hyuk;Choi, Seung Bok
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.27 no.2
    • /
    • pp.182-188
    • /
    • 2017
  • In this work, a new type speaker which features various resonant frequencies is proposed utilizing a magneto-rheological (MR) fluid and its performance is evaluated in terms of the change of the field-dependent sound pressure level. In order to achieve this goal, a whole concept of the speaker system is firstly discussed and subsequently a controllable diaphragm is made using MR fluid whose rheological properties such as viscosity are controllable by the magnitude of magnetic field. Then, the proposed speaker system consisting of the inner structure and the squeeze mode type of MR diaphragm is established in an anechoic room The effectiveness of the proposed speaker system is experimentally evaluated at two different conditions; with and without the magnetic field. It is shown from experimental tests that the sound pressure level at different sound source can be controlled which is not able to achieve using one conventional speaker system.

A study on the Method of the Keyword Spotting Recognition in the Continuous speech using Neural Network (신경 회로망을 이용한 연속 음성에서의 keyword spotting 인식 방식에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.43-49
    • /
    • 1996
  • This research proposes a system for speaker independent Korean continuous speech recognition with 247 DDD area names using keyword spotting technique. The applied recognition algorithm is the Dynamic Programming Neural Network(DPNN) based on the integration of DP and multi-layer perceptron as model that solves time axis distortion and spectral pattern variation in the speech. To improve performance, we classify word model into keyword model and non-keyword model. We make an experiment on postprocessing procedure for the evaluation of system performance. Experiment results are as follows. The recognition rate of the isolated word is 93.45% in speaker dependent case. The recognition rate of the isolated word is 84.05% in speaker independent case. The recognition rate of simple dialogic sentence in keyword spotting experiment is 77.34% as speaker dependent, and 70.63% as speaker independent.

  • PDF

A Blind Segmentation Algorithm for Speaker Verification System (화자확인 시스템을 위한 분절 알고리즘)

  • 김지운;김유진;민홍기;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.45-50
    • /
    • 2000
  • This paper proposes a delta energy method based on Parameter Filtering(PF), which is a speech segmentation algorithm for text dependent speaker verification system over telephone line. Our parametric filter bank adopts a variable bandwidth along with a fixed center frequency. Comparing with other methods, the proposed method turns out very robust to channel noise and background noise. Using this method, we segment an utterance into consecutive subword units, and make models using each subword nit. In terms of EER, the speaker verification system based on whole word model represents 6.1%, whereas the speaker verification system based on subword model represents 4.0%, improving about 2% in EER.

  • PDF