• Title/Summary/Keyword: 화자 검증

Search Result 63, Processing Time 0.021 seconds

ETRI신기술-화자검증기술

  • Electronics and Telecommunications Research Institute
    • Electronics and Telecommunications Trends
    • /
    • v.14 no.5 s.59
    • /
    • pp.151-152
    • /
    • 1999
  • 화자검증기술은 화자의 입력음성으로부터 화자의 특성을 계산하고 해당 화자를 유일하게 구분할 수 있는 통계적 모수를 추출하여 이를 화자의 개인 데이터베이스로 구축하며, 검증시에는 개인데이터베이스와 입력되는 미지 화자의 특성에 대한 유사도를 비교.검증하는 것이다. 또한 이때 주어진 임계치(Threshold)의 만족 정도에 따라 동일 화자여부를 결정하는 결정논리(decision logic)로 검증엔진을 구성하는 기술이며, 응용영역에 따라 환경잡음, 채널잡음 등 사용환경과 전체시스템과의 적절한 시나리오 구성 등이 실용화를 위한 중요한 척도가 된다.

  • PDF

Performance Improvement in GMM-based Text-Independent Speaker Verification System (GMM 기반의 문맥독립 화자 검증 시스템의 성능 향상)

  • Hahm Seong-Jun;Shen Guang-Hu;Kim Min-Jung;Kim Joo-Gon;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.131-134
    • /
    • 2004
  • 본 논문에서는 GMM(Gaussian Mixture Model)을 이용한 문맥독립 화자 검증 시스템을 구현한 후, arctan 함수를 이용한 정규화 방법을 사용하여 화자검증실험을 수행하였다. 특징파라미터로서는 선형예측방법을 이용한 켑스트럼 계수와 회귀계수를 사용하고 화자의 발성 변이를 고려하여 CMN(Cepstral Mean Normalization)을 적용하였다. 화자모델 생성을 위한 학습단에서는 화자발성의 음향학적 특징을 잘 표현할 수 있는 GMM(Gaussian Mixture Model)을 이용하였고 화자 검증단에서는 ML(Maximum Likelihood)을 이용하여 유사도를 계산하고 기존의 정규화 방법과 arctan 함수를 이용한 방법에 의해 정규화된 점수(score)와 미리 정해진 문턱값과 비교하여 검증하였다. 화자 검증 실험결과, arctan 함수를 부가한 방법이 기존의 방법보다 항상 향상된 EER을 나타냄을 확인할 수 있었다.

  • PDF

A PCA-based MFDWC Feature Parameter for Speaker Verification System (화자 검증 시스템을 위한 PCA 기반 MFDWC 특징 파라미터)

  • Hahm Seong-Jun;Jung Ho-Youl;Chung Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.36-42
    • /
    • 2006
  • A Principal component analysis (PCA)-based Mel-Frequency Discrete Wavelet Coefficients (MFDWC) feature Parameters for speaker verification system is Presented in this Paper In this method, we used the 1st-eigenvector obtained from PCA to calculate the energy of each node of level that was approximated by. met-scale. This eigenvector satisfies the constraint of general weighting function that the squared sum of each component of weighting function is unity and is considered to represent speaker's characteristic closely because the 1st-eigenvector of each speaker is fairly different from the others. For verification. we used Universal Background Model (UBM) approach that compares claimed speaker s model with UBM on frame-level. We performed experiments to test the effectiveness of PCA-based parameter and found that our Proposed Parameters could obtain improved average Performance of $0.80\%$compared to MFCC. $5.14\%$ to LPCC and 6.69 to existing MFDWC.

Group-based speaker embeddings for text-independent speaker verification (문장 독립 화자 검증을 위한 그룹기반 화자 임베딩)

  • Jung, Youngmoon;Eom, Youngsik;Lee, Yeonghyeon;Kim, Hoirin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.496-502
    • /
    • 2021
  • Recently, deep speaker embedding approach has been widely used in text-independent speaker verification, which shows better performance than the traditional i-vector approach. In this work, to improve the deep speaker embedding approach, we propose a novel method called group-based speaker embedding which incorporates group information. We cluster all speakers of the training data into a predefined number of groups in an unsupervised manner, so that a fixed-length group embedding represents the corresponding group. A Group Decision Network (GDN) produces a group weight, and an aggregated group embedding is generated from the weighted sum of the group embeddings and the group weights. Finally, we generate a group-based embedding by adding the aggregated group embedding to the deep speaker embedding. In this way, a speaker embedding can reduce the search space of the speaker identity by incorporating group information, and thereby can flexibly represent a significant number of speakers. We conducted experiments using the VoxCeleb1 database to show that our proposed approach can improve the previous approaches.

A New Method of Selecting Cohort for Speaker Verification (화자검증을 위한 새로운 코호트 선택 방법)

  • 김성준;계영철
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.383-387
    • /
    • 2003
  • This paper deals with the method of speaker verification based on the conventional cohort of fixed size. In particular, a new cohort of variable size, which makes use of the distance between speaker models, is proposed: The density of neighboring speaker models within the fixed distance from each speaker is taken into account in the proposed method. The high density leads to the increase of cohort size, thus improving the speaker verification rate. On the other hand, the low density leads to its decrease, thus reducing the amount of computations. The simulation results show that the proposed method outperforms the conventional one, achieving a reduction in the EER.

Noise Rabust Speaker Verification Using Sub-Band Weighting (서브밴드 가중치를 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.279-284
    • /
    • 2009
  • Speaker verification determines whether the claimed speaker is accepted based on the score of the test utterance. In recent years, methods based on Gaussian mixture models and universal background model have been the dominant approaches for text-independent speaker verification. These speaker verification systems based on these methods provide very good performance under laboratory conditions. However, in real situations, the performance of speaker verification system is degraded dramatically. For overcoming this performance degradation, the feature recombination method was proposed, but this method had a drawback that whole sub-band feature vectors are used to compute the likelihood scores. To deal with this drawback, a modified feature recombination method which can use each sub-band likelihood score independently was proposed in our previous research. In this paper, we propose a sub-band weighting method based on sub-band signal-to-noise ratio which is combined with previously proposed modified feature recombination. This proposed method reduces errors by 28% compared with the conventional feature recombination method.

Design and Implementation of Speaker Verification System Using Voice (음성을 이용한 화자 검증기 설계 및 구현)

  • 지진구;윤성일
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.3
    • /
    • pp.91-98
    • /
    • 2000
  • In this paper we design implement the speaker verification system for verifying personal identification using voice. Filter bank magnitude was used as a feature parameter and code-book was made using LBG a1gorithm. The code book convert feature parameters into code sequence. The difference between reference pattern and input pattern measures using DTW(Dynamic Time Warping). The similarity measured using DTW and threshold value derived from deviation were used to discriminate impostor from client speaker.

  • PDF

Speaker verification with ECAPA-TDNN trained on new dataset combined with Voxceleb and Korean (Voxceleb과 한국어를 결합한 새로운 데이터셋으로 학습된 ECAPA-TDNN을 활용한 화자 검증)

  • Keumjae Yoon;Soyoung Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.209-224
    • /
    • 2024
  • Speaker verification is becoming popular as a method of non-face-to-face identity authentication. It involves determining whether two voice data belong to the same speaker. In cases where the criminal's voice remains at the crime scene, it is vital to establish a speaker verification system that can accurately compare the two voice evidence. In this study, to achieve this, a new speaker verification system was built using a deep learning model for Korean language. High-dimensional voice data with a high variability like background noise made it necessary to use deep learning-based methods for speaker matching. To construct the matching algorithm, the ECAPA-TDNN model, known as the most famous deep learning system for speaker verification, was selected. A large dataset of the voice data, Voxceleb, collected from people of various nationalities without Korean. To study the appropriate form of datasets necessary for learning the Korean language, experiments were carried out to find out how Korean voice data affects the matching performance. The results showed that when comparing models learned only with Voxceleb and models learned with datasets combining Voxceleb and Korean datasets to maximize language and speaker diversity, the performance of learning data, including Korean, is improved for all test sets.

The Study on the Verification of Speaker Change using GMM-UBM based KL distance (GMM-UBM 기반 KL 거리를 활용한 화자변화 검증에 대한 연구)

  • Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.71-77
    • /
    • 2016
  • In this paper, we proposed a verification of speaker change utilizing the KL distance based on GMM-UBM to improve the performance of conventional BIC based Speaker Change Detection(SCD). We have verified Conventional BIC-based SCD using KL-distance based SCD which is robust against difference of information volume than BIC-based SCD. And we have applied GMM-UBM to compensate asymmetric information volume. Conventional BIC-based SCD was composed of two steps. Step 1, to detect the Speaker Change Candidate Point(SCCP). SCCP is positive local maximum point of dissimilarity d. Step 2, to determine the Speaker Change Point(SCP). If ${\Delta}BIC$ of SCCP is positive, it decides to SCP. We examined verification of SCP using GMM-UBM based KL distance D. If the value of D on each SCP is higher than threshold, we accepted that point to the final SCP. In the experimental condition MDR(Missed Detection Rate) is 0, FAR(False Alarm Rate) when the threshold value of 0.028 has been improved to 60.7%.

Improving A Text Independent Speaker Identification System By Frame Level Likelihood Normalization (프레임단위유사도정규화를 이용한 문맥독립화자식별시스템의 성능 향상)

  • 김민정;석수영;정현열;정호열
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.487-490
    • /
    • 2001
  • 본 논문에서는 기존의 Caussian Mixture Model을 이용한 실시간문맥독립화자인식시스템의 성능을 향상시키기 위하여 화자검증시스템에서 좋은 결과를 나타내는 유사도정규화 ( Likelihood Normalization )방법을 화자식별시스템에 적용하여 시스템을 구현하였으며, 인식실험한 결과에 대해 보고한다. 시스템은 화자모델생성단과 화자식별단으로 구성하였으며, 화자모델생성단에서는, 화자발성의 음향학적 특징을 잘 표현할 수 있는 GMM(Gaussian Mixture Model)을 이용하여 화자모델을 작성하였으며. GMM의 파라미터를 최적화하기 위하여 MLE(Maximum Likelihood Estimation)방법을 사용하였다. 화자식별단에서는 학습된 데이터와 테스트용 데이터로부터 ML(Maximum Likelihood)을 이용하여 프레임단위로 유사도를 계산하였다. 계산된 유사도는 유사도 정규화 과정을 거쳐 스코어( SC)로 표현하였으며, 가장 높은 스코어를 가지는 화자를 인식화자로 결정한다. 화자인식에서 발성의 종류로는 문맥독립 문장을 사용하였다. 인식실험을 위해서는 ETRI445 DB와 KLE452 DB를 사용하였으며. 특징파라미터로서는 켑스트럼계수 및 회귀계수값만을 사용하였다. 인식실험에서는 등록화자의 수를 달리하여 일반적인 화자식별방법과 프레임단위유사도정규화방법으로 각각 인식실험을 하였다. 인식실험결과, 프레임단위유사도정규화방법이 인식화자수가 많아지는 경우에 일반적인 방법보다 향상된 인식률을 얻을수 있었다.

  • PDF