• Title/Summary/Keyword: 화자군집

Search Result 15, Processing Time 0.023 seconds

Text-Independent Speaker Verification Based on MLP Cohort Model (MLP 군집 모델에 기반한 어구독립 화자증명)

  • 이태승;최호진
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.434-436
    • /
    • 2000
  • 본 논문에서는 기존의 확률적 화자군집 모델을 MLP(multi-layer perceptron)로 구현하는 방법과 원형 화자군집 모델이 갖는 문제를 해결할 수정 모델을 제시한다. 화자군집 모델은 화자등록 시간에 민감한 실용 환경에서 중요한 의미를 지닌다. 본 연구에서 사용한 인식단위는 여러 음소계열에서 지속적인 부분을 추출한 지속음이므로 화자등록과 증명 단계에서 특정한 어구에 한정되지 않는 어구독립 방식을 채택한다.

  • PDF

A Study on the Fast Enrollment of Text-Independent Speaker Verification for Vehicle Security (차량 보안을 위한 어구독립 화자증명의 등록시간 단축에 관한 연구)

  • Lee, Tae-Seung;Choi, Ho-Jin
    • Journal of Advanced Navigation Technology
    • /
    • v.5 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • Speech has a good characteristics of which car drivers busy to concern with miscellaneous operation can make use in convenient handling and manipulating of devices. By utilizing this, this works proposes a speaker verification method for protecting cars from being stolen and identifying a person trying to access critical on-line services. In this, continuant phonemes recognition which uses language information of speech and MLP(mult-layer perceptron) which has some advantages against previous stochastic methods are adopted. The recognition method, though, involves huge computation amount for learning, so it is somewhat difficult to adopt this in speaker verification application in which speakers should enroll themselves at real time. To relieve this problem, this works presents a solution that introduces speaker cohort models from speaker verification score normalization technique established before, dividing background speakers into small cohorts in advance. As a result, this enables computation burden to be reduced through classifying the enrolling speaker into one of those cohorts and going through enrollment for only that cohort.

  • PDF

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.

A Method on the Improvement of Speaker Enrolling Speed for a Multilayer Perceptron Based Speaker Verification System through Reducing Learning Data (다층신경망 기반 화자증명 시스템에서 학습 데이터 감축을 통한 화자등록속도 향상방법)

  • 이백영;황병원;이태승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.585-591
    • /
    • 2002
  • While the multilayer perceptron(MLP) provides several advantages against the existing pattern recognition methods, it requires relatively long time in learning. This results in prolonging speaker enrollment time with a speaker verification system that uses the MLP as a classifier. This paper proposes a method that shortens the enrollment time through adopting the cohort speakers method used in the existing parametric systems and reducing the number of background speakers required to learn the MLP, and confirms the effect of the method by showing the result of an experiment that applies the method to a continuant and MLP-based speaker verification system.

A Study on Modified Clustering Algorithm for Text-Dependent Speaker Verification System (문장종속 화자확인 시스템을 위한 개선된 군집화 알고리즘에 관한 연구)

  • 강철호;정희석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.7
    • /
    • pp.548-553
    • /
    • 2004
  • In this paper we propose modified LBG algorithm to minimize quantization errors. When we apply conventional LBG algorithm for speaker verification system, problems that result from small amount of training data can be generated. That is, quantization error comes from fixed-sized codebook without any consideration for speaker characteristics and splitting vector in the wrong direction worsen performance of speaker verification system. So, we propose modified clustering method that has variable sized codebook according to speaker characteristics and makes right splitting direction by finding the farthest member away from mean and then find another member from the member. Simulation results show effectiveness of the proposed algorithm.

An Improvement of the Enrolling Speed for the MLP-Based Speaker Verification System through Reducing Learning Data (MLP 기반 화자증명 시스템에서 학습 데이터 감축을 통한 등록속도 향상방법)

  • 이태승;황병원
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.619-621
    • /
    • 2002
  • MLP(multilayer perceptron)는 기존의 패턴인식 방법에 비해 몇 가지 이점을 제공하지만 학습에 비교적 많은 시간을 요구한다. 이 점은 화자증명 시스템의 인식방법으로서 MLP를 사용할 경우 등록시간이 길어지는 문제를 발생시킨다. 본 논문에서는 기존의 시스템에서 채택한 화자군집 방법을 응용하여 MLP 학습에 필요만 배경화자 수를 줄임으로써 화자등록 시간을 단축하는 방법을 제안한다.

  • PDF

Improving Speaker Enrolling Speed for Speaker Verification Systems Based on Multilayer Perceptrons by Using a Qualitative Background Speaker Selection (정질적 기준을 이용한 다층신경망 기반 화자증명 시스템의 등록속도 단축방법)

  • 이태승;황병원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.360-366
    • /
    • 2003
  • Although multilayer perceptrons (MLPs) present several advantages against other pattern recognition methods, MLP-based speaker verification systems suffer from slow enrollment speed caused by many background speakers to achieve a low verification error. To solve this problem, the quantitative discriminative cohort speakers (QnDCS) method, by introducing the cohort speakers method into the systems, reduced the number of background speakers required to enroll speakers. Although the QnDCS achieved the goal to some extent, the improvement rate for the enrolling speed was still unsatisfactory. To improve the enrolling speed, this paper proposes the qualitative DCS (QlDCS) by introducing a qualitative criterion to select less background speakers. An experiment for both methods is conducted to use the speaker verification system based on MLPs and continuants, and speech database. The results of the experiment show that the proposed QlDCS method enrolls speakers in two times shorter time than the QnDCS does over the online error backpropagation(EBP) method.

I-vector similarity based speech segmentation for interested speaker to speaker diarization system (화자 구분 시스템의 관심 화자 추출을 위한 i-vector 유사도 기반의 음성 분할 기법)

  • Bae, Ara;Yoon, Ki-mu;Jung, Jaehee;Chung, Bokyung;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.461-467
    • /
    • 2020
  • In noisy and multi-speaker environments, the performance of speech recognition is unavoidably lower than in a clean environment. To improve speech recognition, in this paper, the signal of the speaker of interest is extracted from the mixed speech signals with multiple speakers. The VoiceFilter model is used to effectively separate overlapped speech signals. In this work, clustering by Probabilistic Linear Discriminant Analysis (PLDA) similarity score was employed to detect the speech signal of the interested speaker, which is used as the reference speaker to VoiceFilter-based separation. Therefore, by utilizing the speaker feature extracted from the detected speech by the proposed clustering method, this paper propose a speaker diarization system using only the mixed speech without an explicit reference speaker signal. We use phone-dataset consisting of two speakers to evaluate the performance of the speaker diarization system. Source to Distortion Ratio (SDR) of the operator (Rx) speech and customer speech (Tx) are 5.22 dB and -5.22 dB respectively before separation, and the results of the proposed separation system show 11.26 dB and 8.53 dB respectively.

A Classified Space VQ Design for Text-Independent Speaker Recognition (문맥 독립 화자인식을 위한 공간 분할 벡터 양자기 설계)

  • Lim, Dong-Chul;Lee, Hanig-Sei
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.673-680
    • /
    • 2003
  • In this paper, we study the enhancement of VQ (Vector Quantization) design for text independent speaker recognition. In a concrete way, we present a non-iterative method which makes a vector quantization codebook and this method performs non-iterative learning so that the computational complexity is epochally reduced The proposed Classified Space VQ (CSVQ) design method for text Independent speaker recognition is generalized from Semi-noniterative VQ design method for text dependent speaker recognition. CSVQ contrasts with the existing desiEn method which uses the iterative learninE algorithm for every traininE speaker. The characteristics of a CSVQ design is as follows. First, the proposed method performs the non-iterative learning by using a Classified Space Codebook. Second, a quantization region of each speaker is equivalent for the quantization region of a Classified Space Codebook. And the quantization point of each speaker is the optimal point for the statistical distribution of each speaker in a quantization region of a Classified Space Codebook. Third, Classified Space Codebook (CSC) is constructed through Sample Vector Formation Method (CSVQ1, 2) and Hyper-Lattice Formation Method (CSVQ 3). In the numerical experiment, we use the 12th met-cepstrum feature vectors of 10 speakers and compare it with the existing method, changing the codebook size from 16 to 128 for each Classified Space Codebook. The recognition rate of the proposed method is 100% for CSVQ1, 2. It is equal to the recognition rate of the existing method. Therefore the proposed CSVQ design method is, reducing computational complexity and maintaining the recognition rate, new alternative proposal and CSVQ with CSC can be applied to a general purpose recognition.

An Improvement of the MLP Based Speaker Verification System through Improving the learning Speed and Reducing the Learning Data (학습속도 개선과 학습데이터 축소를 통한 MLP 기반 화자증명 시스템의 등록속도 향상방법)

  • Lee, Baek-Yeong;Lee, Tae-Seung;Hwang, Byeong-Won
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.3
    • /
    • pp.88-98
    • /
    • 2002
  • The multilayer perceptron (MLP) has several advantages against other pattern recognition methods, and is expected to be used as the learning and recognizing speakers of speaker verification system. But because of the low learning speed of the error backpropagation (EBP) algorithm that is used for the MLP learning, the MLP learning requires considerable time. Because the speaker verification system must provide verification services just after a speaker's enrollment, it is required to solve the problem. So, this paper tries to make short of time required to enroll speakers with the MLP based speaker verification system, using the method of improving the EBP learning speed and the method of reducing background speakers which adopts the cohort speakers method from the existing speaker verification.