• Title/Summary/Keyword: speaker recognition

Search Result 556, Processing Time 0.023 seconds

Isolated Word Recognition Using a Speaker-Adaptive Neural Network (화자적응 신경망을 이용한 고립단어 인식)

  • 이기희;임인칠
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.5
    • /
    • pp.765-776
    • /
    • 1995
  • This paper describes a speaker adaptation method to improve the recognition performance of MLP(multiLayer Perceptron) based HMM(Hidden Markov Model) speech recognizer. In this method, we use lst-order linear transformation network to fit data of a new speaker to the MLP. Transformation parameters are adjusted by back-propagating classification error to the transformation network while leaving the MLP classifier fixed. The recognition system is based on semicontinuous HMM's which use the MLP as a fuzzy vector quantizer. The experimental results show that rapid speaker adaptation resulting in high recognition performance can be accomplished by this method. Namely, for supervised adaptation, the error rate is signifecantly reduced from 9.2% for the baseline system to 5.6% after speaker adaptation. And for unsupervised adaptation, the error rate is reduced to 5.1%, without any information from new speakers.

  • PDF

A Study on the Text-Independent Speaker Recognition from the Vowel Extraction (모음 검출을 통한 텍스트 독립 화자인식에 관한 연구)

  • 김에녹;복혁규;김형래
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.10
    • /
    • pp.82-91
    • /
    • 1994
  • In this thesis, we perform the experiment of speaker recognition by identifying vowels in the pronounciation of each speaker. In detail, we extract the vowels from the pronounciation of each speaker first. From it, we check the frequency energgy of 29 channels. After changing these into fuzzy values, we employ the fuzzy inference to recognize the speaker by text-dependent and text-independent methods. For this experiment, an algorithm of extracting vowels is developed, and newly introduced parameter is the frequency energy of the 29 channels computed from the extracted vowels. It shows the features of each speakers better than existing parameters. The advanced point of this paramter is to use the reference pattern only without the help of any codebook. As a rewult, test-dependent method showed about 95.5% rate of recognition, and text-independent method showed about 94.2% rate of recognition.

  • PDF

Speaker and Context Independent Emotion Recognition using Speech Signal (음성을 이용한 화자 및 문장독립 감정인식)

  • 강면구;김원구
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.377-380
    • /
    • 2002
  • In this paper, speaker and context independent emotion recognition using speech signal is studied. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy and to evaluate the performance of the conventional pattern matching algorithms. The vector quantization based emotion recognition system is proposed for speaker and context independent emotion recognition. Experimental results showed that vector quantization based emotion recognizer using MFCC parameters showed better performance than that using the Pitch and energy Parameters.

  • PDF

Speaker Adaptation in HMM-based Korean Isoklated Word Recognition (한국어 격리단어 인식 시스템에서 HMM 파라미터의 화자 적응)

  • 오광철;이황수;은종관
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.40 no.4
    • /
    • pp.351-359
    • /
    • 1991
  • This paper describes performances of speaker adaptation using a probabilistic spectral mapping matrix in hidden-Markov model(HMM) -based Korean isolated word recognition. Speaker adaptation based on probabilistic spectral mapping uses a well-trained prototype HMM's and is carried out by Viterbi, dynamic time warping, and forward-backward algorithms. Among these algorithms, the best performance is obtained by using the Viterbi approach together with codebook adaptation whose improvement for isolated word recognition accuracy is 42.6-68.8 %. Also, the selection of the initial values of the matrix and the normalization in computing the matrix affects the recognition accuracy.

A study on Effective Feature Parameters Comparison for Speaker Recognition (화자인식에 효과적인 특징벡터에 관한 비교연구)

  • Park TaeSun;Kim Sang-Jin;Kwang Moon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.145-148
    • /
    • 2003
  • In this paper, we carried out comparative study about various feature parameters for the effective speaker recognition such as LPC, LPCC, MFCC, Log Area Ratio, Reflection Coefficients, Inverse Sine, and Delta Parameter. We also adopted cepstral liftering and cepstral mean subtraction methods to check their usefulness. Our recognition system is HMM based one with 4 connected-Korean-digit speech database. Various experimental results will help to select the most effective parameter for speaker recognition.

  • PDF

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.299-307
    • /
    • 2003
  • There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

  • Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.95-101
    • /
    • 2012
  • In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.

Implementation of Speaker Verification Security System Using DSP Processor(TMS320C32) (DSP Processor(TMS320C32)를 이용한 화자인증 보안시스템의 구현)

  • Haam, Young-Jun;Kwon, Hyuk-Jae;Choi, Soo-Young;Jeong, lk-Joo
    • Journal of Industrial Technology
    • /
    • v.21 no.B
    • /
    • pp.107-116
    • /
    • 2001
  • The speech includes various kinds of information : language information, speaker's information, affectivity, hygienic condition, utterance environment etc. when a person communicates with others. All technologies to utilize in real life processing this speech are called the speech technology. The speech technology contains speaker's information that among them and it includes a speech which is known as a speaker recognition. DTW(Dynamic Time Warping) is the speaker recognition technology that seeks the pattern of standard speech signal and the similarity degree in an inputted speech signal using dynamic programming. ln this study, using TMS320C32 DSP processor, we are to embody this DTW and to construct a security system.

  • PDF

Impostor Detection in Speaker Recognition Using Confusion-Based Confidence Measures

  • Kim, Kyu-Hong;Kim, Hoi-Rin;Hahn, Min-Soo
    • ETRI Journal
    • /
    • v.28 no.6
    • /
    • pp.811-814
    • /
    • 2006
  • In this letter, we introduce confusion-based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model-universal background model (GMM-UBM) scheme, our confusion-based measures show better performance in noise-corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors.

  • PDF

Histogram Equalization Using Background Speakers' Utterances for Speaker Identification (화자 식별에서의 배경화자데이터를 이용한 히스토그램 등화 기법)

  • Kim, Myung-Jae;Yang, Il-Ho;So, Byung-Min;Kim, Min-Seok;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.79-86
    • /
    • 2012
  • In this paper, we propose a novel approach to improve histogram equalization for speaker identification. Our method collects all speech features of UBM training data to make a reference distribution. The ranks of the feature vectors are calculated in the sorted list of the collection of the UBM training data and the test data. We use the ranks to perform order-based histogram equalization. The proposed method improves the accuracy of the speaker recognition system with short utterances. We use four kinds of speech databases to evaluate the proposed speaker recognition system and compare the system with cepstral mean normalization (CMN), mean and variance normalization (MVN), and histogram equalization (HEQ). Our system reduced the relative error rate by 33.3% from the baseline system.