• Title/Summary/Keyword: speaker recognition

검색결과 553건 처리시간 0.025초

감정 적응을 이용한 감정 화자 인식 (Emotional Speaker Recognition using Emotional Adaptation)

  • 김원구
    • 전기학회논문지
    • /
    • 제66권7호
    • /
    • pp.1105-1110
    • /
    • 2017
  • Speech with various emotions degrades the performance of the speaker recognition system. In this paper, a speaker recognition method using emotional adaptation has been proposed to improve the performance of speaker recognition system using affective speech. For emotional adaptation, emotional speaker model was generated from speaker model without emotion using a small number of training affective speech and speaker adaptation method. Since it is not easy to obtain a sufficient affective speech for training from a speaker, it is very practical to use a small number of affective speeches in a real situation. The proposed method was evaluated using a Korean database containing four emotions. Experimental results show that the proposed method has better performance than conventional methods in speaker verification and speaker recognition.

고유영역을 이용한 문자독립형 화자인식에 관한 연구 (A Study On Text Independent Speaker Recognition Using Eigenspace)

  • 함철배;이동규;이두수
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1999년도 하계종합학술대회 논문집
    • /
    • pp.671-674
    • /
    • 1999
  • We report the new method for speaker recognition. Until now, many researchers have used HMM (Hidden Markov Model) with cepstral coefficient or neural network for speaker recognition. Here, we introduce the method of speaker recognition using eigenspace. This method can reduce the training and recognition time of speaker recognition system. In proposed method, we use the low rank model of the speech eigenspace. In experiment, we obtain good recognition result.

  • PDF

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • 제45권4호
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

화자 인식을 위한 GMM기반의 이중 보상 구조 (Double Compensation Framework Based on GMM For Speaker Recognition)

  • 김유진;정재호
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.93-105
    • /
    • 2003
  • In this paper, we present a single framework based on GMM for speaker recognition. The proposed framework can simultaneously minimize environmental variations on mismatched conditions and adapt the bias free and speaker-dependent characteristics of claimant utterances to the background GMM to create a speaker model. We compare the closed-set speaker identification for conventional method and the proposed method both on TIMIT and NTIMIT. In the several sets of experiments we show the improved recognition rates on a simulated channel and a telephone channel condition by 7.2% and 27.4% respectively.

  • PDF

유전자 알고리즘을 이용한 화자인식 시스템 성능 향상 (Performance Improvement of Speaker Recognition System Using Genetic Algorithm)

  • 문인섭;김종교
    • 한국음향학회지
    • /
    • 제19권8호
    • /
    • pp.63-67
    • /
    • 2000
  • 본 논문에서는 화자인식의 성능향상을 위한 dynamic time warping (DTW) 기반의 문맥 제시형 화자인식에 대해 연구하였다. 화자인식에 있어 중요한 요소인 화자의 특성을 잘 반영할 수 있는 참조패턴을 생성하기 위해 유전자 알고리즘을 적용하였다. 또한, 문맥 종속형과 문맥 독립형 화자인식의 단점을 개선하기 위해 문맥 제시형 화자인식을 수행하였다. Clos set에서 화자식별과 open set에서 화자확인 실험을 하였으며 실험결과 기존 방법의 참조패턴을 이용하였을 경우보다 유전자 알고리즘에 의한 참조패턴이 인식률과 인식속도 면에서 우수함을 보였다.

  • PDF

지능형 서비스 로봇을 위한 문맥독립 화자인식 시스템 (Context-Independent Speaker Recognition in URC Environment)

  • 지미경;김성탁;김회린
    • 로봇학회논문지
    • /
    • 제1권2호
    • /
    • pp.158-162
    • /
    • 2006
  • This paper presents a speaker recognition system intended for use in human-robot interaction. The proposed speaker recognition system can achieve significantly high performance in the Ubiquitous Robot Companion (URC) environment. The URC concept is a scenario in which a robot is connected to a server through a broadband connection allowing functions to be performed on the server side, thereby minimizing the stand-alone function significantly and reducing the robot client cost. Instead of giving a robot (client) on-board cognitive capabilities, the sensing and processing work are outsourced to a central computer (server) connected to the high-speed Internet, with only the moving capability provided by the robot. Our aim is to enhance human-robot interaction by increasing the performance of speaker recognition with multiple microphones on the robot side in adverse distant-talking environments. Our speaker recognizer provides the URC project with a basic interface for human-robot interaction.

  • PDF

SNR을 이용한 프레임별 유사도 가중방법을 적용한 문맥종속 화자인식에 관한 연구 (A Study on the Context-dependent Speaker Recognition Adopting the Method of Weighting the Frame-based Likelihood Using SNR)

  • 최홍섭
    • 대한음성학회지:말소리
    • /
    • 제61호
    • /
    • pp.113-123
    • /
    • 2007
  • The environmental differences between training and testing mode are generally considered to be the critical factor for the performance degradation in speaker recognition systems. Especially, general speaker recognition systems try to get as clean speech as possible to train the speaker model, but it's not true in real testing phase due to environmental and channel noise. So in this paper, the new method of weighting the frame-based likelihood according to frame SNR is proposed in order to cope with that problem. That is to make use of the deep correlation between speech SNR and speaker discrimination rate. To verify the usefulness of this proposed method, it is applied to the context dependent speaker identification system. And the experimental results with the cellular phone speech DB which is designed by ETRI for Koran speaker recognition show that the proposed method is effective and increase the identification accuracy by 11% at maximum.

  • PDF

화자인식을 위한 퍼지-상관차원과 퍼지-리아프노프차원의 평가 (The Evaluation of the Fuzzy-Chaos Dimension and the Fuzzy-Lyapunov Ddimension)

  • 유병욱;박현숙;김창석
    • 음성과학
    • /
    • 제7권3호
    • /
    • pp.167-183
    • /
    • 2000
  • In this paper, we propose two kinds of chaos dimensions, the fuzzy correlation and fuzzy Lyapunov dimensions, for speaker recognition. The proposal is based on the point that chaos enables us to analyze the non-linear information contained in individual's speech signal and to obtain superior discrimination capability. We confirm that the proposed fuzzy chaos dimensions play an important role in enhancing speaker recognition ratio, by absorbing the variations of the reference and test pattern attractors. In order to evaluate the proposed fuzzy chaos dimensions, we suggest speaker recognition using the proposed dimensions. In other words, we investigate the validity of the speaker recognition parameters, by estimating the recognition error according to the discrimination error of an individual speaker from the reference pattern.

  • PDF

최적경로와 가중직교인자를 이용한 화자인식 (Speaker Recognition Using Optimal Path and Weighted Orthogonal Parameters)

  • 박승규;배철수
    • 한국음향학회지
    • /
    • 제11권2호
    • /
    • pp.68-72
    • /
    • 1992
  • 최근, 많은 연구자들이 KLT를 이용한 통계적 처리방법으로 화자인식을 수행하고 있으나, 통계적 처리방법의 개인성 포함정도와 음성의 동적인 발성속도는 화자인식율의 저하요인이 되고 있다. 본연구에서는 각 화자의 직교인자에 개인성을 강조하기 위하여 화자의 고유치를 가중치로 한 가중직교인자와 음성의 동적인 시간특성을 정규화하는 DTW의 최적경로를 이용한 화자인식방법을 연구하였다. 이방법을 확인하기 위하여 종래의 통계적 처리에 의한 화자인식, 최적경로와 최적경로와 가중직교인자를 이용한 화자인식의 결과를 비교한 결과, 종래의 방법보다 우수한 화자인식율을 얻어 그 유효성을 확인하였다.

  • PDF

최적경로와 가중직교인자를 이용한 화자인식 (Speaker Recognition Using Optimal Path and Weighted Orthogonal Parameters)

  • 남기환;배철수
    • 한국정보통신학회논문지
    • /
    • 제7권7호
    • /
    • pp.1539-1544
    • /
    • 2003
  • 최근 많은 연구자들이 KLT를 이용한 통계적 처리방법으로 화자인식을 수행하고 있으나, 통계적 처리방법의 개인성 포함정도와 음성의 동적인 발성속도는 화자인식률의 저하요인이 되고 있다. 본 연구에서는 각 화자의 직교인자에 개인성을 강조하기 위하여 화자의 고유치를 가중치로 한 가중직교 인자와 음성의 동적인 시간 특성을 정규화 하는 DTW의 최적경로를 이용한 화자인식방법을 연구하였다. 이 방법을 확인하기 위하여 종래의 통계적 처리에 의한 화자인식, 최적경로와 가중직교인자를 이용한 화자인식의 결과를 비교한 결과, 종래의 방법보다 우수한 화자인식률을 얻어 그 유효성을 확인하였다.