• 제목/요약/키워드: speaker detection

검색결과 108건 처리시간 0.026초

상태변수 기반의 실시간 음성검출 알고리즘의 최적화 (Optimization of State-Based Real-Time Speech Endpoint Detection Algorithm)

  • 김수환;이영재;김영일;정상배
    • 말소리와 음성과학
    • /
    • 제2권4호
    • /
    • pp.137-143
    • /
    • 2010
  • In this paper, a speech endpoint detection algorithm is proposed. The proposed algorithm is a kind of state transition-based ones for speech detection. To reject short-duration acoustic pulses which can be considered noises, it utilizes duration information of all detected pulses. For the optimization of parameters related with pulse lengths and energy threshold to detect speech intervals, an exhaustive search scheme is adopted while speech recognition rates are used as its performance index. Experimental results show that the proposed algorithm outperforms the baseline state-based endpoint detection algorithm. At 5 dB input SNR for the beamforming input, the word recognition accuracies of its outputs were 78.5% for human voice noises and 81.1% for music noises.

  • PDF

화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델 (RPCA-GMM for Speaker Identification)

  • 이윤정;서창우;강상기;이기용
    • 한국음향학회지
    • /
    • 제22권7호
    • /
    • pp.519-527
    • /
    • 2003
  • 음성신호는 주변 잡음과 화자의 발성 패턴 변화, 음성 검출 오류에서 생기는 이상치(outlier)에 많은 영향을 받고 있다. 이러한 음성 신호를 이용하여 화자인식에 이용할 경우 인식률이 저하된다. 본 논문에서는 화자식별 (speaker identification)에서 학습 특징 벡터의 이상치와 고차원 문제를 해결하기 위하여 M-추정을 이용한 강인한 주성분 분석 가우시안 혼합모델 (Robust Principal Component Analysis-Gaussian Mixture Model)방법을 제안하였다. 제안된 방법은 먼저, 특징 벡터에 이상치가 존재할 경우 M-추정에 의하여 강인한 공분산 행렬을 재추정하여 얻어진 고유벡터로부터 변환 행렬을 구하여 감소된 차원을 갖는 새로운 특징벡터를 구한다. 여기에서 얻은 선형변환된 특징벡터로부터 화자의 가우시안 혼합 모델을 구한다. 제안된 방법의 성능을 검증하기 위하여 화자식별 실험을 하였다. 실험은 전형적인 가우시안 혼합 모델 방법과 주성분 분석법, 제안된 방법을 비교 분석하였다. 이상치가 2%씩 증가할 때마다 가우시안 혼합모델 방법과 주성분 분석법은 각각 0.65%, 0.55%씩 화자식별 성능이 저하되었지만, 제안된 방법은 0.03%정도 감소하였으므로 이상치에 더욱 강인함을 알 수 있다.

화자의 긍정·부정 의도를 전달하는 실용적 텔레프레즌스 로봇 시스템의 개발 (Development of a Cost-Effective Tele-Robot System Delivering Speaker's Affirmative and Negative Intentions)

  • 진용규;유수정;조혜경
    • 로봇학회논문지
    • /
    • 제10권3호
    • /
    • pp.171-177
    • /
    • 2015
  • A telerobot offers a more engaging and enjoyable interaction with people at a distance by communicating via audio, video, expressive gestures, body pose and proxemics. To provide its potential benefits at a reasonable cost, this paper presents a telepresence robot system for video communication which can deliver speaker's head motion through its display stanchion. Head gestures such as nodding and head-shaking can give crucial information during conversation. We also can assume a speaker's eye-gaze, which is known as one of the key non-verbal signals for interaction, from his/her head pose. In order to develop an efficient head tracking method, a 3D cylinder-like head model is employed and the Harris corner detector is combined with the Lucas-Kanade optical flow that is known to be suitable for extracting 3D motion information of the model. Especially, a skin color-based face detection algorithm is proposed to achieve robust performance upon variant directions while maintaining reasonable computational cost. The performance of the proposed head tracking algorithm is verified through the experiments using BU's standard data sets. A design of robot platform is also described as well as the design of supporting systems such as video transmission and robot control interfaces.

Invisible Messenger: A System to Whisper in a Person′s Ear Remotely by integrating Visual Tracking and Speaker Array

  • Mizoguchi, Hiroshi;Kanamori, Tomohiko;Okabe, Kosuke;Hiraoka, Kazuyuki;Tanaka, Masaru;Shigehara, Takaomi;Mishima, Taketoshi
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -3
    • /
    • pp.1897-1900
    • /
    • 2002
  • This paper proposes a novel computer-human interface, named invisible Messenger. It integrates face detection and tracking, and speaker array signal processing. By speaker array it is possible to form acoustic focus at the arbitrary location that is measured by the face tracking. Thus the proposed system can whisper in a person's ear as if an invisible virtual messenger were standing by the person. Not only speculative discussion, the authors have implemented a working prototype system based upon the proposed idea. This paper also describes about this prototype. In order to confirm effectiveness of the proposed idea, the authors conduct experiments using the implemented system. Experimental results demonstrate the effectivenss of the proposed idea.

  • PDF

Bidirectional Alarm Equipment for Protection for Trackside Worker using Bone-anchored Speaker

  • Hwang, Jong-Gyu;Jo, Hyun-Jeong
    • International Journal of Safety
    • /
    • 제10권1호
    • /
    • pp.36-40
    • /
    • 2011
  • Personnel maintaining or repairing the railway tracks or signaling facilities around tracks may experience the sensory disorder when doing maintenance works at the trackside of railway for long time. In this case personnel maintaining at the trackside may collide with the train since they cannot recognize the approach of motor-car although it approaches to the vicinity of maintenance workplace because of the sensory block phenomenon occurred due to their long hours of continued monotonous maintenance work. In order to prevent such motor-car accidents that may occur because railway track workers are unable to recognize the approaching train, the safety alarm equipment is developed to make the approaching motor-car send radio signals and bidirectional detection mechanism between approaching train and trackside personnel. It shows the possibility of utilization in various forms of safety equipment for workers only to the safety helmet to be worn by the maintenance workers while using the configuration of transmitting/receiving sides. In the paper it is represented new alarm equipment, which is the bone-anchored speaker-based safety helmet to be worn by the maintenance workers.

  • PDF

텔레매틱스 시스템을 위한 반향제거 및 Barge-In 기능을 갖는 음성인터페이스 (Speech Interface with Echo Canceller and Barge- In Functionality for Telematic System)

  • 김준;배건성
    • 한국음향학회지
    • /
    • 제28권5호
    • /
    • pp.483-490
    • /
    • 2009
  • 본 논문에서는 배경잡음과 반향이 존재하는 차량환경에서 음성인식 성능을 향상시키기 위해 상관계수를 이용한 동시통화 검출 알고리즘을 적용한 음향 반향제거기와 barge-in 기능을 갖는 음성 인터페이스를 구현하였다. 상관계수를 이용한 동시통화 검출 알고리즘은 임계치 설정 및 배경잡음의 영향 등으로 인해 검출 오류가 발생한다. 이를 보완하기 위해 동시통화 검출 조건으로 매 샘플마다 입력신호에서 추정한 배경잡음 및 반향신호의 평균 전력을 이용하여 동시통화 검출 오류를 줄였으며, 시변의 임계치를 적용한 후처리 단을 통해 시변의 잔여 잡음 성분을 제거하였다. 또한 안내음성 중에 음성입력이 가능하도록 barge-in 기능을 적용한 음성 인터페이스 시스템을 구현하였다. 제안한 음성 인터페이스 시스템은 동시통화 검출 오류와 이로 인해 발생되는 문제점을 효율적으로 해결할 수 있음을 실험을 통하여 확인하였다.

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

  • Eng, Goh Kia;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.1291-1295
    • /
    • 2004
  • This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.

  • PDF

Some effects of audio-visual speech in perceiving Korean

  • Kim, Jee-Sun;Davis, Chris
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 1999년도 제11회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.335-342
    • /
    • 1999
  • The experiments reported here investigated whether seeing a speaker's face (visible speech) affects the perception and memory of Korean speech sounds. In order to exclude the possibility of top-down, knowledge-based influences on perception and memory, the experiments tested people with no knowledge of Korean. The first experiment examined whether visible speech (Auditory and Visual - AV) assists English native speakers (with no knowledge of Korean) in the detection of a syllable within a Korean speech phrase. It was found that a syllable was more likely to be detected within a phrase when the participants could see the speaker's face. The second experiment investigated whether English native speakers' judgments about the duration of a Korean phrase would be affected by visible speech. It was found that in the AV condition participant's estimates of phrase duration were highly correlated with the actual durations whereas those in the AO condition were not. The results are discussed with respect to the benefits of communication with multimodal information and future applications.

  • PDF

안드로이드 환경의 다중생체인식 기술을 응용한 인증 성능 개선 연구 (Enhancement of Authentication Performance based on Multimodal Biometrics for Android Platform)

  • 최성필;정강훈;문현준
    • 한국멀티미디어학회논문지
    • /
    • 제16권3호
    • /
    • pp.302-308
    • /
    • 2013
  • 본 논문은 모바일 환경에서의 다중생체인식을 통한 개인인증 시나리오에서 false acceptance rate (FAR)가 향상된 시스템을 제안한다. 다중생체인식을 위하여 얼굴인식과 화자인식을 선택하였으며, 시스템의 인식 시나리오는 다음을 따른다. 얼굴인식을 위하여 Modified census transform (MCT) 기반의 얼굴검출과 k-means 클러스터 분석 (cluster analysis) 알고리즘 기반의 눈 검출을 통해 얼굴영역 전처리를 수행하고, principal component analysis (PCA) 기반의 얼굴인증 시스템을 구현한다. 화자인식을 위하여 음성의 끝점추출과 Mel frequency cepstral coefficient (MFCC) 특징을 추출하고, dynamic time warping (DTW) 기반의 화자 인증 시스템을 구현한다. 그리고 각각의 생체인식을 본 논문에서 제안된 방법을 기반으로 융합하여 인식률을 향상시킨다. 본 논문의 실험은 Android 환경에서 수행하였으며, 구현한 다중생체인식 시스템과 단일생체인식 시스템과의 FAR을 비교하였다. 단일 얼굴인식의 FAR은 4.6%, 단일 화자인식의 FAR은 6.7%로 각각 나타났으며, 제안된 다중생체인식 시스템의 FAR은 1.8%로 크게 감소하였다.

실시간 음성인식 다이얼링 시스템 개발 (Development of a Real-time Voice Recognition Dialing System;)

  • 이세웅;최승호;이미숙;김흥국;오광철;김기철;이황수
    • 정보와 통신
    • /
    • 제10권10호
    • /
    • pp.22-29
    • /
    • 1993
  • This paper describes development of a real-time voice recognition dialing system which can recognize around one hundred word vocabularies in speaker independent mode. The voice recognition algorithm is implemented on a DSP board with a telephone interface plugged in an IBM PC AT/486. In the DSP board, procedures for feature extraction, vector quantization(VQ), and end-point detection are performed simultaneously in every 10msec frame interval to satisfy real-time constraints after the word starting point detection. In addition, we optimize the VQ codebook size and the end-point detection procedure to reduce recognition time and memory requirement. The demonstration system is being displayed in MOBILAB of Korea Mobile Telecom at the Taejon EXPO '93.

  • PDF