• Title/Summary/Keyword: 화자검증

Search Result 63, Processing Time 0.03 seconds

Development of Advanced Personal Identification System Using Iris Image and Speech Signal (홍채와 음성을 이용한 고도의 개인확인시스템)

  • Lee, Dae-Jong;Go, Hyoun-Joo;Kwak, Keun-Chang;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.3
    • /
    • pp.348-354
    • /
    • 2003
  • This proposes a new algorithm for advanced personal identification system using iris pattern and speech signal. Since the proposed algorithm adopts a fusion scheme to take advantage of iris recognition and speaker identification, it shows robustness for noisy environments. For evaluating the performance of the proposed scheme, we compare it with the iris pattern recognition and speaker identification respectively. In the experiments, the proposed method showed more 56.7% improvements than the iris recognition method and more 10% improvements than the speaker identification method for high quality security level. Also, in noisy environments, the proposed method showed more 30% improvements than the iris recognition method and more 60% improvements than the speaker identification method for high quality security level.

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

소프트웨어 로봇을 위한 인간-로봇 상호작용

  • Gwak Geun-Chang;Ji Su-Yeong;Jo Yeong-Jo
    • The Magazine of the IEIE
    • /
    • v.33 no.3 s.262
    • /
    • pp.49-55
    • /
    • 2006
  • 인간과 로봇의 자연스러운 상호작용을 위하여 영상과 음성을 기반으로 한 인간-로봇 상호작용 (HRI: Human Robot Interaction) 기술들을 소개한다. URC개념의 서버/클라이언트 구조를 갖는 소프트웨어 로봇에 수행 가능한 얼굴 인식 및 검증, 준 생체정보(semi biometrics)를 이용한 사용자 인식, 제스처인식, 화자인식 및 검증, 대화체 음성인식 기술들에 대하여 살펴본다. 이러한 인간-로봇 상호작용 기술들은 초고속 인터넷과 같은 IT 인프라를 이용하는 URC(Ubiquitous Robotic Companion) 기반의 지능형 서비스 로봇을 위한 핵심기술로서 사용되어진다.

  • PDF

A comparison study of the characteristics of pauses and breath groups during paragraph reading for normal female adults with and without voice disorders (정상성인 여성 화자와 음성장애 성인 여성 화자의 문단 낭독 시 휴지 및 호흡단락 특성의 비교)

  • Pyo, Hwa Young
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.109-116
    • /
    • 2019
  • This study was conducted to identify the characteristics of pauses and breath groups made by normal adults and patients with voice disorders while reading a paragraph. Forty normal female adults and forty female patients with a functional voice disorder (18-45 yrs.) read the "Gaeul" paragraph with the "Running Speech" protocol of the Phonatory Aerodynamic System (PAS), by which the pauses with or without inspiration and between or within syntactic words and breath groups were analyzed. The number of pauses with inspiration was found to be higher in the patient group, but the number of pauses without inspiration was higher in the normal group. The rate of syntactic word boundaries with pauses with inspiration was higher in the patient group, while the number of syllables per breath group was higher in the normal group. As these results can be explained by patients' poor breath support due to glottal insufficiency, the question of whether voice disorder patients use their pauses and breath groups properly should be considered carefully in evaluation and intervention.

Speaker Verification Using Hidden LMS Adaptive Filtering Algorithm and Competitive Learning Neural Network (Hidden LMS 적응 필터링 알고리즘을 이용한 경쟁학습 화자검증)

  • Cho, Seong-Won;Kim, Jae-Min
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.51 no.2
    • /
    • pp.69-77
    • /
    • 2002
  • Speaker verification can be classified in two categories, text-dependent speaker verification and text-independent speaker verification. In this paper, we discuss text-dependent speaker verification. Text-dependent speaker verification system determines whether the sound characteristics of the speaker are equal to those of the specific person or not. In this paper we obtain the speaker data using a sound card in various noisy conditions, apply a new Hidden LMS (Least Mean Square) adaptive algorithm to it, and extract LPC (Linear Predictive Coding)-cepstrum coefficients as feature vectors. Finally, we use a competitive learning neural network for speaker verification. The proposed hidden LMS adaptive filter using a neural network reduces noise and enhances features in various noisy conditions. We construct a separate neural network for each speaker, which makes it unnecessary to train the whole network for a new added speaker and makes the system expansion easy. We experimentally prove that the proposed method improves the speaker verification performance.

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

Word-balloon effects on Video (비디오에 대한 말풍선 효과 합성)

  • Lee, Sun-Young;Lee, In-Kwon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.332-334
    • /
    • 2012
  • 최근 영화나 드라마 같은 미디어 데어터가 폭발적으로 증가하면서, 다양한 언어로 번역된 자막 데이터도 증가하고 있다. 이러한 자막은 대부분 화면 하단이나 우측에 위치가 고정되어 나타내는 방식을 취하고 있다. 그러나 이 방식에는 몇 가지 한계점을 가지고 있다. 자막과 등장인물의 얼굴이 거리가 먼 경우, 시청자의 시선이 분산되어 영상에 집중하기 어렵다는 점과 청각장애를 가진 사람의 경우 자막만으로는 누가 말하고 있는 대사인지 혼동이 온다는 점이다. 본 논문에서는 만화에서 대사를 전달하기 위해 사용하던 말풍선을 동영상의 자막을 나타내는데 사용하는 새로운 자막 시스템을 제안한다. 말풍선을 사용하면 말꼬리로 화자의 위치를 가리키고, 시청자의 시선을 화자의 얼굴 근처에 집중시킴으로써 기존 자막이 갖는 한계점을 개선시킬 수 있다. 본 연구의 결과물을 검증하기 위해 사용자 평가를 실시했고, 기존의 자막 방식에 비해 시선의 안정성과 흥미성, 정확도에서 더 낫다는 결과를 얻을 수 있었다.

VR Companion Animal Communion System for Pet Loss Syndrome (펫로스 증후군을 위한 VR 반려동물 교감 시스템)

  • Choi, Hyeong-Mun;Moon, Mikyeong;Lee, Gun-ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.563-564
    • /
    • 2021
  • 반려동물 보유 가구 수가 증가하면서 반려동물의 상실로 인한 펫로스 증후군을 호소하는 반려인 또한 증가하고 있다. 펫로스 증후군을 치유하기 위해 반려동물을 가상으로라도 만나서 평소에 하던 말과 행동을 할 수 있도록 하여 차츰 이별을 할 수 있도록 할 필요가 있다. 본 논문에서는 VR을 통하여 반려인이 3D로 모델링 된 반려동물과 직접 교감할 수 있는 시스템에 대한 연구 내용을 기술한다. 이 시스템을 통해 떠나보낸 반려동물과 평소와 같은 말과 행동을 할 수 있도록 도와주어 감정의 정화를 서서히 할 수 있도록 해준다.

  • PDF

AI-based stuttering automatic classification method: Using a convolutional neural network (인공지능 기반의 말더듬 자동분류 방법: 합성곱신경망(CNN) 활용)

  • Jin Park;Chang Gyun Lee
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.71-80
    • /
    • 2023
  • This study primarily aimed to develop an automated stuttering identification and classification method using artificial intelligence technology. In particular, this study aimed to develop a deep learning-based identification model utilizing the convolutional neural networks (CNNs) algorithm for Korean speakers who stutter. To this aim, speech data were collected from 9 adults who stutter and 9 normally-fluent speakers. The data were automatically segmented at the phrasal level using Google Cloud speech-to-text (STT), and labels such as 'fluent', 'blockage', prolongation', and 'repetition' were assigned to them. Mel frequency cepstral coefficients (MFCCs) and the CNN-based classifier were also used for detecting and classifying each type of the stuttered disfluency. However, in the case of prolongation, five results were found and, therefore, excluded from the classifier model. Results showed that the accuracy of the CNN classifier was 0.96, and the F1-score for classification performance was as follows: 'fluent' 1.00, 'blockage' 0.67, and 'repetition' 0.74. Although the effectiveness of the automatic classification identifier was validated using CNNs to detect the stuttered disfluencies, the performance was found to be inadequate especially for the blockage and prolongation types. Consequently, the establishment of a big speech database for collecting data based on the types of stuttered disfluencies was identified as a necessary foundation for improving classification performance.

Implementation of the Timbre-based Emotion Recognition Algorithm for a Healthcare Robot Application (헬스케어 로봇으로의 응용을 위한 음색기반의 감정인식 알고리즘 구현)

  • Kong, Jung-Shik;Kwon, Oh-Sang;Lee, Eung-Hyuk
    • Journal of IKEEE
    • /
    • v.13 no.4
    • /
    • pp.43-46
    • /
    • 2009
  • This paper deals with feeling recognition from people's voice to fine feature vectors. Voice signals include the people's own information and but also people's feelings and fatigues. So, many researches are being progressed to fine the feelings from people's voice. In this paper, We analysis Selectable Mode Vocoder(SMV) that is one of the standard 3GPP2 codecs of ETSI. From the analyzed result, we propose voices features for recognizing feelings. And then, feeling recognition algorithm based on gaussian mixture model(GMM) is proposed. It uses feature vectors is suggested. We verify the performance of this algorithm from changing the mixture component.

  • PDF