• Title/Summary/Keyword: 화자 특징

Search Result 299, Processing Time 0.027 seconds

Speaker recognition technique for offline conference recording system (오프라인 회의 기록 지원시스템을 위한 화자 인식 기법)

  • Park, Han-Mu;Son, Yun-Sik;Jeong, Jin-U
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.29-32
    • /
    • 2007
  • 최근 영상 처리 기술이 발달함에 따라 다양한 응용시스템에 영상 처리 기술을 접목하려는 시도가 나타나고 있다. 특히 영상 내의 얼굴을 객체로 다루는 인식 기술의 발전으로 얼굴 정보를 이용한 기술의 응용 분야는 게임 및 카메라 둥 다양한 분야에서 사용되고 있다. 본 논문에서는 오프라인 회의 보조 시스템에서 화자를 구분하기 위한 기법을 제시한다. 제안된 기법은 얼굴 객체 정보에서 화자 구별을 위한 특징 값을 제시하고, 이를 이용하여 얻어진 입 주변 엣지(Edge)를 이루는 픽셀들의 분산 값으로 화자 여부를 판단한다.

  • PDF

GMM-based Emotion Recognition Using Speech Signal (음성 신호를 사용한 GMM기반의 감정 인식)

  • 서정태;김원구;강면구
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.235-241
    • /
    • 2004
  • This paper studied the pattern recognition algorithm and feature parameters for speaker and context independent emotion recognition. In this paper, KNN algorithm was used as the pattern matching technique for comparison, and also VQ and GMM were used for speaker and context independent recognition. The speech parameters used as the feature are pitch. energy, MFCC and their first and second derivatives. Experimental results showed that emotion recognizer using MFCC and its derivatives showed better performance than that using the pitch and energy parameters. For pattern recognition algorithm. GMM-based emotion recognizer was superior to KNN and VQ-based recognizer.

Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination (Glottal flow 신호에서의 향상된 특징추출 및 다중 특징파라미터 결합을 통한 화자인식 성능 향상)

  • Kang, Jihoon;Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.12
    • /
    • pp.2792-2799
    • /
    • 2015
  • In this paper, we utilize source mel-frequency cepstral coefficients (SMFCCs), skewness, and kurtosis extracted in glottal flow signals to improve speaker recognition performance. Generally, because the high band magnitude response of glottal flow signals is somewhat flat, the SMFCCs are extracted using the response below the predefined cutoff frequency. The extracted SMFCC, skewness, and kurtosis are concatenated with conventional feature parameters. Then, dimensional reduction by the principal component analysis (PCA) and the linear discriminat analysis (LDA) is followed to compare performances with conventional systems under equivalent conditions. The proposed recognition system outperformed the conventional system for large scale speaker recognition experiments. Especially, the performance improvement was more noticeable for small Gaussan mixtures.

Speaker Identification Using Higher-Order Statistics In Noisy Environment (고차 통계를 이용한 잡음 환경에서의 화자식별)

  • Shin, Tae-Young;Kim, Gi-Sung;Kwon, Young-Uk;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.25-35
    • /
    • 1997
  • Most of speech analysis methods developed up to date are based on second order statistics, and one of the biggest drawback of these methods is that they show dramatical performance degradation in noisy environments. On the contrary, the methods using higher order statistics(HOS), which has the property of suppressing Gaussian noise, enable robust feature extraction in noisy environments. In this paper we propose a text-independent speaker identification system using higher order statistics and compare its performance with that using the conventional second-order-statistics-based method in both white and colored noise environments. The proposed speaker identification system is based on the vector quantization approach, and employs HOS-based voiced/unvoiced detector in order to extract feature parameters for voiced speech only, which has non-Gaussian distribution and is known to contain most of speaker-specific characteristics. Experimental results using 50 speaker's database show that higher-order-statistics-based method gives a better identificaiton performance than the conventional second-order-statistics-based method in noisy environments.

  • PDF

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

  • 김민정;석수영;김광수;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.1
    • /
    • pp.8-14
    • /
    • 2002
  • In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.

  • PDF

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

A Study on Speaker Recognition Algorithm Through Wire/Wireless Telephone (유무선 전화를 통한 화자인식 알고리즘에 관한 연구)

  • 김정호;정희석;강철호;김선희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.182-187
    • /
    • 2003
  • In this thesis, we propose the algorithm to improve the performance of speaker verification that is mapping feature parameters by using RBF neural network. There is a big difference between wire vector region and wireless one which comes from the same speaker. For wire/wireless speakers model production, speaker verification system should distinguish the wire/wireless channel that based on speech recognition system. And the feature vector of untrained channel models is mapped to the feature vector(LPC Cepstrum) of trained channel model by using RBF neural network. As a simulation result, the proposed algorithm makes 0.6%∼10.5% performance improvement compared to conventional method such as cepstral mean subtraction.

A study on the Speaker Recognition using the Pitch (피치계수를 이용한 화자인식에 관한 연구)

  • 김에녹
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.4
    • /
    • pp.471-480
    • /
    • 2001
  • In this thesis, we perform the experiment of speaker recognition by identifying vowels in the pronunciation of each speaker using Adaptive Resource Theory 2(ART2) model. The 5 adult males and 5 adult females pronounce from 0 to 9 digits. We extract the vowels from the pronunciation of each speaker first, we are extracted characteristic coefficient through a pitch detection algorithm, a LPC analysis, and a LPC cepstral analysis to generate an input pattern of ART2. The experimental results showed that pitch coefficients are somewhat more enhanced than LPC or LPC cepstral coefficient.

  • PDF

Speaker Recognition Based on Robust PCA (강인한 주성분 분석법을 갖는 화자인식)

  • Lee Youn Jeong;Lee Ki Yong
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.225-228
    • /
    • 2002
  • 본 논문에서는 화자인식을 위하여 강인한 주성분 분석법(Robust Principal Component Analysis)을 갖는 화자인식 방법을 제안하였다. 강인한 주성분 분석법은 특징벡터들의 outlier가 존재할 경우 k-차원으로 줄이면서 강인한 화자 모델을 만들기 위하여 사용한다. 기존의 PCA 방법은 순수한 화자의 정보가 잡음 등의 outlier에 의해 손상될 수 있으므로, 강인한 주성분 분석법을 사용하여 outlier의 영향을 감소 시켰다. 화자 별로 k-차원 diagonal GMM 학습시 mixture 수를 적응시켜 데이터 저장 공간을 최소화하였다. 200명의 고립 숫자음을 사용하여 기존의 diagonal GMM 방법과 제안된 방법을 실험한 결과, 제안된 방법에서 약 $1.5\%$더 높은 인증률을 얻을 수 있었다.

  • PDF

Speech Identification of Male and Female Speakers in Noisy Speech for Improving Performance of Speech Recognition System (음성인식 시스템의 성능 향상을 위한 잡음음성의 남성 및 여성화자의 음성식별)

  • Choi, Jae-seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.619-620
    • /
    • 2017
  • 본 논문에서는 음성인식 알고리즘에 매우 중요한 정보를 제공하는 화자의 성별인식을 위하여 신경회로망을 사용하여 잡음 환경 하에서 남성음성 및 여성음성의 화자를 식별하는 성별인식 알고리즘을 제안한다. 본 논문에서 제안하는 신경회로망은 MFCC의 계수를 사용하여 음성의 각 구간에서 남성음성 및 여성음성의 화자를 인식할 수 있는 알고리즘이다. 실험결과로부터 백색잡음이 중첩된 잡음환경 하에서 음성신호의 MFCC의 특징벡터를 사용함으로써 남성음성 및 여성음성의 화자에 대해서 양호한 성별인식 결과가 구해졌다.

  • PDF