• Title/Summary/Keyword: speaker detection

Search Result 108, Processing Time 0.026 seconds

Development of a Read-time Voice Dialing System Using Discrete Hidden Markov Models (이산 HM을 이용한 실시간 음성인식 다이얼링 시스템 개발)

  • Lee, Se-Woong;Choi, Seung-Ho;Lee, Mi-Suk;Kim, Hong-Kook;Oh, Kwang-Cheol;Kim, Ki-Chul;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1E
    • /
    • pp.89-95
    • /
    • 1994
  • This paper describes development of a real-time voice dialing system which can recognize around one hundred word vocabularies in speaker independent mode. The voice recognition algorithm in this system is implemented on a DSP board with a telephone interface plugged in an IBM PC AT/486. In the DSP board, procedures for feature extraction, vector quantization(VQ), and end-point detection are performed simultaneously in every 10 msec frame interval to satisfy real-time constraints after detecting the word starting point. In addition, we optimize the VQ codebook size and the end-point detection procedure to reduce recognition time and memory requirement. The demonstration system has been displayed in MOBILAB of the Korean Mobile Telecom at the Taejon EXPO'93.

  • PDF

Development of Animal Tracking Method Based on Edge Computing for Harmful Animal Repellent System. (엣지컴퓨팅 기반 유해조수 퇴치 드론의 동물 추적기법 개발)

  • Lee, Seul;Kim, Jun-tae;Lee, Sang-Min;Cho, Soon-jae;Jeong, Seo-hoon;Kim, Hyung Hoon;Shim, Hyun-min
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.224-227
    • /
    • 2020
  • 엣지컴퓨팅 기반 유해조수 퇴치 Drone의 유해조수 추적 기술은 Doppler Sensor를 이용해 사유지에 침입한 유해조수를 인식 후 사용자에게 위험 요소에 대한 알림 서비스를 제공한다. 이후 사용자는 Drone의 Camera와 전용 애플리케이션을 이용해 경작지를 실시간으로 보며 Drone을 조종한다. Camera는 Tensor Flow Object Detection Deep Learning을 적용하여 유해조수를 학습 및 파악, 추적한다. 이후 Drone은 Speaker와 Neo Pixel LED Ring을 이용해 유해조수의 시각과 청각을 자극해 도망을 유도하며 퇴치한다. Tensor Flow object detection을 핵심으로 Drone에 접목했고 이를 위해 전용 애플리케이션을 개발했다.

Pitch Period Detection Algorithm Using Rotation Transform of AMDF (AMDF의 회전변환을 이용한 피치 주기 검출 알고리즘)

  • Seo, Hyun-Soo;Bae, Sang-Bum;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1019-1022
    • /
    • 2005
  • As recent information communication technology is rapidly developed, a lot of researches related to speech signal processing have been processed. So pitch period is applied as important factor to many application fields such as speech recognition, speaker identification, speech analysis and synthesis. Therefore, many algorithms related to pitch detection have been proposed in time domain and frequency domain and AMDF(average magnitude difference function) which is one of pitch detection algorithms in time domain chooses time interval from valley to valley as pitch period. But, in selection of valley point to detect pitch period, complexity of the algorithm is increased. So in this paper we proposed pitch detection algorithm using rotation transform of AMDF, that taking the global minimum valley point as pitch period and established a threshold about the phoneme in beginning portion, to exclude pitch period selection. and compared existing methods with proposed method through simulation.

  • PDF

A Development of Wireless Sensor Networks for Collaborative Sensor Fusion Based Speaker Gender Classification (협동 센서 융합 기반 화자 성별 분류를 위한 무선 센서네트워크 개발)

  • Kwon, Ho-Min
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.12 no.2
    • /
    • pp.113-118
    • /
    • 2011
  • In this paper, we develop a speaker gender classification technique using collaborative sensor fusion for use in a wireless sensor network. The distributed sensor nodes remove the unwanted input data using the BER(Band Energy Ration) based voice activity detection, process only the relevant data, and transmit the hard labeled decisions to the fusion center where a global decision fusion is carried out. This takes advantages of power consumption and network resource management. The Bayesian sensor fusion and the global weighting decision fusion methods are proposed to achieve the gender classification. As the number of the sensor nodes varies, the Bayesian sensor fusion yields the best classification accuracy using the optimal operating points of the ROC(Receiver Operating Characteristic) curves_ For the weights used in the global decision fusion, the BER and MCL(Mutual Confidence Level) are employed to effectively combined at the fusion center. The simulation results show that as the number of the sensor nodes increases, the classification accuracy was even more improved in the low SNR(Signal to Noise Ration) condition.

A Study on the Improvement of Speaker Recognition System by Voice Activity Detection (음성구간검출을 통한 화자식별 시스템의 성능개선에 관한 연구)

  • 신동성;정영훈;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.789-792
    • /
    • 2001
  • 성능향상에 관한 실험을 하였다. 화자식별 방식은 등록된 음성과 테스트 음성을 비교하여 결정논리에 의하여서 화자를 식별하는 방식이다. 이러한 시스템에서 전처리(preprocessing)를 어떻게 해 주느냐에 따라서 인식률에 큰 영향을 미치게 된다. 본 논문에서는 전처리 과정 중에서 음성구간 검출에 대한 실험을 수행하여 성능을 비교 하였다. 본 논문에서는 시간영역에서 안정구간(stationary region)과 전이구간(transition region)에서 Normalized AMDF를 적용하였을 때 피치점에서 골(valley)의 기울기가 크다는 점을 이용하여 유성을 검출하였다. 그리고 검출된 유성음 구간 앞뒤로 인접 샘플의 자기상관관계함수(Autocorrelation)의 비를 이용하여 무성음을 검출하였다. 결과적으로 처리시간은 비슷하였으나 전체 인식률은 약 2%정도 개선되었다.

  • PDF

Common Speech Database Collection and Validation for Communications (한국어 공통 음성 DB구축 및 오류 검증)

  • Lee Soo-jong;Kim Sanghun;Lee Youngjik
    • MALSORI
    • /
    • no.46
    • /
    • pp.145-157
    • /
    • 2003
  • In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.

  • PDF

Binary clustering network for recognition of keywords in continuous speech (연속음성중 키워드(Keyword) 인식을 위한 Binary Clustering Network)

  • 최관선;한민홍
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1993.10a
    • /
    • pp.870-876
    • /
    • 1993
  • This paper presents a binary clustering network (BCN) and a heuristic algorithm to detect pitch for recognition of keywords in continuous speech. In order to classify nonlinear patterns, BCN separates patterns into binary clusters hierarchically and links same patterns at root level by using the supervised learning and the unsupervised learning. BCN has many desirable properties such as flexibility of dynamic structure, high classification accuracy, short learning time, and short recall time. Pitch Detection algorithm is a heuristic model that can solve the difficulties such as scaling invariance, time warping, time-shift invariance, and redundance. This recognition algorithm has shown recognition rates as high as 95% for speaker-dependent as well as multispeaker-dependent tests.

  • PDF

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

Speech Recognition in the Car Noise Environment (자동차 소음 환경에서 음성 인식)

  • 김완구;차일환;윤대희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.2
    • /
    • pp.51-58
    • /
    • 1993
  • This paper describes the development of a speaker-dependent isolated word recognizer as applied to voice dialing in a car noise environment. for this purpose, several methods to improve performance under such condition are evaluated using database collected in a small car moving at 100km/h The main features of the recognizer are as follow: The endpoint detection error can be reduced by using the magnitude of the signal which is inverse filtered by the AR model of the background noise, and it can be compensated by using variants of the DTW algorithm. To remove the noise, an autocorrelation subtraction method is used with the constraint that residual energy obtainable by linear predictive analysis should be positive. By using the noise rubust distance measure, distortion of the feature vector is minimized. The speech recognizer is implemented using the Motorola DSP56001(24-bit general purpose digital signal processor). The recognition database is composed of 50 Korean names spoken by 3 male speakers. The recognition error rate of the system is reduced to 4.3% using a single reference pattern for each word and 1.5% using 2 reference patterns for each word.

  • PDF

Modified SNR-Normalization Technique for Robust Speech Recognition

  • Jung, Hoi-In;Shim, Kab-Jong;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3E
    • /
    • pp.14-18
    • /
    • 1997
  • One fo the major problems in speech recognition is the mismatch between training and testing environments. Recently, SNR normalization technique, which normalizes the dynamic range of frequency channels in mel-scaled filterbank, was proposed[1]. While it showed improved robustness against additive noise, it requires a reliable speech detection mechanism and several adaptation parameters to be optimized. In this paper, we propose a modified SNR normalization technique. In this technique, we take simply the maximum of filterbank output and predetermined masking constant for each frequency band. According to the speaker-independent isolated word recognition in car noise environments, proposed modification yields better recognition performance that the original SNR normalization method, with rather reduced complexity.

  • PDF