Search | Korea Science

Speaker Adaptation Performance Evaluation in Keyword Spotting System (500단어급 핵심어 검출기에서 화자적응 성능 평가)

Seo Hyun-Chul;Lee Kyong-Rok;Kim Jin-Young;Choi Seung-Ho
- MALSORI
- /
- no.43
- /
- pp.151-161
- /
- 2002
This study presents performance analysis results of speaker adaptation for keyword spotting system. In this paper, we implemented MLLR (Maximum Likelihood Linear Regression) method on our middle size vocabulary keyword spotting system. This system was developed for directory services of universities and colleges. The experimental results show that speaker adaptation reduces the false alarm rate to 1/3 with the preservation of the mis-detection ratio. This improvement is achieved when speaker adaptation is applied to not only keyword models but also non-keyword models.
PDF

돼지 췌장 유래 엘라스타제의 항원성 시험

김순희;백남기;이상득;김원배;양중익;안병옥;이순복
- Environmental Mutagens and Carcinogens
- /
- v.10 no.2
- /
- pp.113-118
- /
- 1990
엘라스타제를 기니픽에 감작시킨 후에 항체검출시험을 실시하였다. 감작은 경구투여 및 피하주사의 두가지 방법으로 하였다. 경구감작은 투여직전 엘라스타제를 생리식염수에 용해한 후 강제투여하였으며, 피하감작은 엘라스타제를 Freund's complete adjuvant 에 현탁시킨 후, 목 뒷부분에 주사하였다. 항체생성의 양성, 음성 검출은 경구감작군에 대해, 능동적전신성아나필랙시 (ASA번)과 면역확산번 (ID법)으로, 피하감작군에 대해 수신피부아나필랙시 (PCA법)으로 행하였다. 결과, 경구감작군은 임상용량 (180 Unit/kg)의 경우는 ASA법 및 ID법 모두에서 음성을 보였으나, 임상 10배용량의 경우는 ASA법에서는 양성을 보인 동물이 있었으나 통계학적 우의성은 없었으며 (p<0.1), ID법에서는 음성을 보였다. 피하감작군은 모든 투여군에서 PCA양성을 보였다. 결론적으로 엘라스타제는 항원성이 인정되나, 임상용량을 경구적으로 투여한 경우는 항원성이 나타나지 않았다.
PDF

Voice Activity Detection Based on Signal Energy and Entropy-difference in Noisy Environments (엔트로피 차와 신호의 에너지에 기반한 잡음환경에서의 음성검출)

Ha, Dong-Gyung;Cho, Seok-Je;Jin, Gang-Gyoo;Shin, Ok-Keun
- Journal of Advanced Marine Engineering and Technology
- /
- v.32 no.5
- /
- pp.768-774
- /
- 2008
In many areas of speech signal processing such as automatic speech recognition and packet based voice communication technique, VAD (voice activity detection) plays an important role in the performance of the overall system. In this paper, we present a new feature parameter for VAD which is the product of energy of the signal and the difference of two types of entropies. For this end, we first define a Mel filter-bank based entropy and calculate its difference from the conventional entropy in frequency domain. The difference is then multiplied by the spectral energy of the signal to yield the final feature parameter which we call PEED (product of energy and entropy difference). Through experiments. we could verify that the proposed VAD parameter is more efficient than the conventional spectral entropy based parameter in various SNRs and noisy environments.
https://doi.org/10.5916/jkosme.2008.32.5.768 인용 PDF KSCI

기능적 자기공명영상 및 확산텐서영상을 이용한 전음성 난청과 감각신경성 난청군의 비교 연구： 예비 결과

이재준;황문정;이영주;김인성;배성진;장용민;이상흔;우성구;강덕식
- Proceedings of the KSMRM Conference
- /
- 2003.10a
- /
- pp.94-94
- /
- 2003
목적： 기능적 자기공명영상과 확산텐서영상기법을 이용하여 전음성 난청과 감각신경성 난청에서의 뇌활성화 양상 그리고 청신경경로상의 차이점을 비교 연구하고자 하였다. 대상 및 방법： 전음성 난청군 (n=4)과 감각신경성 난청군(n=5) 그리고 정상군(n=5)에서의 기능적 자기 공명영상과 확산텐서영상을 획득하였다. 기능적 자기공명영상의 경우 1.5T Siemens MR scanner에서 BOLD 기법을 이용하여 500 Hz 순음 청각자극에 대한 뇌활성화 영역을 검출하였고 영상촬영시 발생하는 기계적 소음을 차폐하기 위한 청각자극기를 특별히 제작하여 사용하였다. 뇌백질신경로를 영상화하는 확산텐서영상은 3.0T GE whole body MR scanner를 사용하였으며 미세한 확산운동을 검출하기 위해 초고속 영상기법인 EPI 기법을 사용하였다. 영상의 화질을 높이기 위해 공간적으로 25개의 다른 방향으로 확산경사자장을 가하였다. 청신경로의 비등방성 영상, 신경로 방향 영상등을 구현하기 위해 획득한 확산영상들에 대한 영상 후처리과정을 시행하였다.
PDF

Segmentation of the Korean speech signals into phonetic units using the super resolution pitch determination (고해상 피치검출을 이용한 한국어 음성신호의 음소분리)

이응구;이두수
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.18 no.2
- /
- pp.270-278
- /
- 1993
This paper is presented the phonetic segmentation alg9rithm of the Korean speech signals which is finded the exact pitch using the super resoluton pitch determination and is compared corss-correlation to threshold each pitch period. The features of the proposed algorithm are infinite resolution and high reliability, and also can separate transient or silent segment. The algorithm is instrumental to speech processing applications which require vector quantization and speech recognition. The presented algorithm is implemented by 386-MATLAB on PC 386/DX and is verified the exact pitch period and the phonetic segmentation of speech signals.
PDF

Audio-Visual Integration based Multi-modal Speech Recognition System (오디오-비디오 정보 융합을 통한 멀티 모달 음성 인식 시스템)

Lee, Sahng-Woon;Lee, Yeon-Chul;Hong, Hun-Sop;Yun, Bo-Hyun;Han, Mun-Sung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11a
- /
- pp.707-710
- /
- 2002
본 논문은 오디오와 비디오 정보의 융합을 통한 멀티 모달 음성 인식 시스템을 제안한다. 음성 특징 정보와 영상 정보 특징의 융합을 통하여 잡음이 많은 환경에서 효율적으로 사람의 음성을 인식하는 시스템을 제안한다. 음성 특징 정보는 멜 필터 캡스트럼 계수(Mel Frequency Cepstrum Coefficients: MFCC)를 사용하며, 영상 특징 정보는 주성분 분석을 통해 얻어진 특징 벡터를 사용한다. 또한, 영상 정보 자체의 인식률 향상을 위해 피부 색깔 모델과 얼굴의 형태 정보를 이용하여 얼굴 영역을 찾은 후 강력한 입술 영역 추출 방법을 통해 입술 영역을 검출한다. 음성-영상 융합은 변형된 시간 지연 신경 회로망을 사용하여 초기 융합을 통해 이루어진다. 실험을 통해 음성과 영상의 정보 융합이 음성 정보만을 사용한 것 보다 대략 5%-20%의 성능 향상을 보여주고 있다.
PDF

A Single-End-Point DTW Algorithm for Keyword Spotting (핵심어 검출을 위한 단일 끝점 DTW알고리즘)

최용선;오상훈;이수영
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.41 no.3
- /
- pp.209-219
- /
- 2004
In order to implement a real time hardware for keyword spotting, we propose a Single-End-Point DTW(SEP-DTW) algorithm which is simple and less complex for computation. The SEP-DTW algorithm only needs a single end point which enables efficient applications, and it has a small wont of computations because the global search area is divided into successive local search areas. Also, we adopt new local constraints and a new distance measure for a better performance of the SEP-DTW algorithm. Besides, we make a normalization of feature same vectors so that they have the same variance in each frequency bin, and each frame has the same energy levels. To construct several reference patterns for each keyword, we use a clustering algorithm for all training patterns, and mean vectors in every cluster are taken as reference patterns. In order to detect a key word for input streams of speech, we measure the distances between reference patterns and input pattern, and we make a decision whether the distances are smaller than a pre-defined threshold value. With isolated speech recognition and keyword spotting experiments, we verify that the proposed algorithm has a better performance than other methods.
PDF KSCI

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
- Journal of Digital Convergence
- /
- v.14 no.8
- /
- pp.233-243
- /
- 2016
Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.
https://doi.org/10.14400/JDC.2016.14.8.233 인용 PDF KSCI

Double Talk Detection before the Convergence of Echo Canceller (반향제거기의 수렴전 동시통화검출)

Yoo, Jae-Ha;Kim, Soo-Chan;Kim, Dong-Yon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.13 no.5
- /
- pp.203-208
- /
- 2013
In this paper, we proposed a performance improvement method of the double talk detector which can operate before the echo canceller converges. Microphone input signal is filtered by the linear prediction filter and this filtered signal is used for detection. The coefficients of the linear prediction filter are given by the far-end talker signal. During single talk, filtered signal has low power since the characteristics of the echo signal is similar with those of the far-end talker signal. But, during double talk, the filtered signal does not have low power because the signal of different characteristics is included in the microphone signal. Double talk is detected by this difference. Simulations using real speech signals verified that the proposed method outperformed the conventional methods.
https://doi.org/10.7236/JIIBC.2013.13.5.203 인용 PDF KSCI

Performance Comparison of Filler Models and Word Spotting Ratio for Sentence Rejection in Phoneme-based Recognition Networks (문장 거부를 위한 음소기반 인식 네트워크에서의 필러 모델 비율과 단어 검출률의 성능비교)

Kim Hyung-Tai;Lee Byung-Hyuk;Ha Jin-Young
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07b
- /
- pp.856-858
- /
- 2005
음성인식 시스템에서 입력된 음성 데이터에 대해 비인식 대상을 거부하는 기능은 신뢰도 보장 측면에 있어서 상당히 중요하며, 신뢰도를 높이기 위해서는 단순한 인식기능 외에 부적절한 입력 패턴의 거부 기능이 필요하다. 본 논문에서는 이러한 신뢰성 문제를 해결하기 위하여 음소기반 인식 네트워크에서 필러 모델 방법과 단어 검출률 방법을 사용하여 실험하였고, 문장의 단어 수에 따른 두 방법의 문장 거부 성능을 FAR과 FRR의 평균을 최소화 하는 값을 각각 구함으로써 비교${\cdot}$분석 하였다. 그 결과 필러모델 방법이 좀 더 나은 거부 성능을 보였고, 단어 검출률을 이용하는 방법이 인식 네트워크를 전부 거치지 않아도 되므로 실행속도와 메모리 절약에서 효과적이었다.
PDF

Search Result 726, Processing Time 0.043 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)