• Title/Summary/Keyword: 화자검증

Search Result 63, Processing Time 0.023 seconds

Noise Robust Speaker Verification Using Subband-Based Reliable Feature Selection (신뢰성 높은 서브밴드 특징벡터 선택을 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • MALSORI
    • /
    • no.63
    • /
    • pp.125-137
    • /
    • 2007
  • Recently, many techniques have been proposed to improve the noise robustness for speaker verification. In this paper, we consider the feature recombination technique in multi-band approach. In the conventional feature recombination for speaker verification, to compute the likelihoods of speaker models or universal background model, whole feature components are used. This computation method is not effective in a view point of multi-band approach. To deal with non-effectiveness of the conventional feature recombination technique, we introduce a subband likelihood computation, and propose a modified feature recombination using subband likelihoods. In decision step of speaker verification system in noise environments, a few very low likelihood scores of a speaker model or universal background model cause speaker verification system to make wrong decision. To overcome this problem, a reliable feature selection method is proposed. The low likelihood scores of unreliable feature are substituted by likelihood scores of the adaptive noise model. In here, this adaptive noise model is estimated by maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. The proposed method using subband-based reliable feature selection obtains better performance than conventional feature recombination system. The error reduction rate is more than 31 % compared with the feature recombination-based speaker verification system.

  • PDF

Real-Time Implementation of Speaker Dependent Speech Recognition Hardware Module Using the TMS320C32 DSP : VR32 (TMS320C32 DSP를 이용한 실시간 화자종속 음성인식 하드웨어 모듈(VR32) 구현)

  • Chung, Ik-Joo;Chung, Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4
    • /
    • pp.14-22
    • /
    • 1998
  • 본 연구에서는 Texas Instruments 사의 저가형 부동소수점 디지털 신호 처리기 (Digital Singnal Processor, DSP)인 TMS320C32를 이용하여 실시간 화자종속 음성인식 하 드웨어 모듈(VR32)을 개발하였다. 하드웨어 모듈의 구성은 40MHz의 TMS320C32 DSP, 14bit 코덱인 TLC32044(또는 8bit μ-law PCM 코덱), EPROM과 SRAM 등의 메모리와 호 스트 인터페이스를 위한 로직 회로로 이루어졌다. 뿐만 아니라 이 하드웨어 모듈을 PC사에 서 평가해보기 위한 PC 인터페이스용 보드 및 소프트웨어도 개발하였다. 음성인식 알고리 즘의 구성은 에너지와 ZCR을 기반으로 한 끝점검출(Endpoint Detection) 침 10차 가중 LPC 켑스터럼(Weighted LPC Cepstrum) 분석이 실시간으로 이루어지며 이후 Dynamic Time Warping(DTW)를 통하여 최고 유사 단어를 결정하고 다시 검증과정을 거쳐 최종 인식을 수행한다. 끝점검출의 경우 적응 문턱값(Adaptive threshold)을 이용하여 잡음에 강인한 끝 점검출이 가능하며 DTW 알고리즘의 경우 C 및 어셈블리를 이용한 최적화를 통하여 계산 속도를 대폭 개선하였다. 현재 인식률은 일반 사무실 환경에서 통상 단축다이얼 용도로 사 용할 수 있는 30 단어에 대하여 95% 이상으로 매우 높은 편이며, 특히 배경음악이나 자동 차 소음과 같은 잡음환경에서도 잘 동작한다.

  • PDF

A Study on Out-of-Vocabulary Rejection Algorithms using Variable Confidence Thresholds (가변 신뢰도 문턱치를 사용한 미등록어 거절 알고리즘에 대한 연구)

  • Bhang, Ki-Duck;Kang, Chul-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.11
    • /
    • pp.1471-1479
    • /
    • 2008
  • In this paper, we propose a technique to improve Out-Of-Vocabulary(OOV) rejection algorithms in variable vocabulary recognition system which is much used in ASR(Automatic Speech Recognition). The rejection system can be classified into two categories by their implementation method, keyword spotting method and utterance verification method. The utterance verification method uses the likelihood ratio of each phoneme Viterbi score relative to anti-phoneme score for deciding OOV. In this paper, we add speaker verification system before utterance verification and calculate an speaker verification probability. The obtained speaker verification probability is applied for determining the proposed variable-confidence threshold. Using the proposed method, we achieve the significant performance improvement; CA(Correctly Accepted for keyword) 94.23%, CR(Correctly Rejected for out-of-vocabulary) 95.11% in office environment, and CA 91.14%, CR 92.74% in noisy environment.

  • PDF

Impact of face masks on spectral and cepstral measures of speech: A case study of two Korean voice actors (한국어 스펙트럼과 캡스트럼 측정시 안면마스크의 영향: 남녀 성우 2인 사례 연구)

  • Wonyoung Yang;Miji Kwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.4
    • /
    • pp.422-435
    • /
    • 2024
  • This study intended to verify the effects of face masks on the Korean language in terms of acoustic, aerodynamic, and formant parameters. We chose all types of face masks available in Korea based on filter performance and folding type. Two professional voice actors (a male and a female) with more than 20 years of experience who are native Koreans and speak standard Korean participated in this study as speakers of voice data. Face masks attenuated the high-frequency range, resulting in decreased Vowel Space Area (VSA) and Vowel Articulation Index (VAI)scores and an increased Low-to-High spectral ratio (L/H ratio) in all voice samples. This can result in lower speech intelligibility. However, the degree of increment and decrement was based on the voice characteristics. For female speakers, the Speech Level (SL) and Cepstral Peak Prominence (CPP) increased with increasing face mask thickness. In this study, the presence or filter performance of a face mask was found to affect speech acoustic parameters according to the speech characteristics. Face masks provoked vocal effort when the vocal intensity was not sufficiently strong, or the environment had less reverberance. Further research needs to be conducted on the vocal efforts induced by face masks to overcome acoustic modifications when wearing masks.

Optimal Feature Parameters Extraction for Speech Recognition of Ship's Wheel Orders (조타명령의 음성인식을 위한 최적 특징파라미터 검출에 관한 연구)

  • Moon, Serng-Bae;Chae, Yang-Bum;Jun, Seung-Hwan
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.13 no.2 s.29
    • /
    • pp.161-167
    • /
    • 2007
  • The goal of this paper is to develop the speech recognition system which can control the ship's auto pilot. The feature parameters predicting the speaker's intention was extracted from the sample wheel orders written in SMCP(IMO Standard Marine Communication Phrases). And we designed the post-recognition procedure based on the parameters which could make a final decision from the list of candidate words. To evaluate the effectiveness of these parameters and the procedure, the basic experiment was conducted with total 525 wheel orders. From the experimental results, the proposed pattern recognition procedure has enhanced about 42.3% over the pre-recognition procedure.

  • PDF

Performance Comparison of Deep Feature Based Speaker Verification Systems (깊은 신경망 특징 기반 화자 검증 시스템의 성능 비교)

  • Kim, Dae Hyun;Seong, Woo Kyeong;Kim, Hong Kook
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.9-16
    • /
    • 2015
  • In this paper, several experiments are performed according to deep neural network (DNN) based features for the performance comparison of speaker verification (SV) systems. To this end, input features for a DNN, such as mel-frequency cepstral coefficient (MFCC), linear-frequency cepstral coefficient (LFCC), and perceptual linear prediction (PLP), are first compared in a view of the SV performance. After that, the effect of a DNN training method and a structure of hidden layers of DNNs on the SV performance is investigated depending on the type of features. The performance of an SV system is then evaluated on the basis of I-vector or probabilistic linear discriminant analysis (PLDA) scoring method. It is shown from SV experiments that a tandem feature of DNN bottleneck feature and MFCC feature gives the best performance when DNNs are configured using a rectangular type of hidden layers and trained with a supervised training method.

Recognize the Emotional state of the Speaker by using HMM (HMM을 이용한 화자의 감정 상태 인식)

  • Lee, Na-Ra;Han, Ki-Hong;Kim, Hyun-jung;Won, Il-Young
    • Annual Conference of KIPS
    • /
    • 2013.11a
    • /
    • pp.1517-1520
    • /
    • 2013
  • 사용자 중심의 다양한 서비스를 제공하기 위해 음성을 통한 자동화된 감정 인식은 중요한 연구분야라고 할 수 있다. 앞선 연구에서는 감독학습과 비감독 학습을 결합하여 적용하였지만, 만족할만한 성능은 얻지 못했다. 이는 음성의 시간성을 고려하지 않은 학습방법의 사용하지 않았기 때문이다. 본 연구에서는 HMM(Hidden Markov Model)을 사용하여 학습하고 실험으로 검증하였다. 실험 결과는 기존의 방법들 보다 성능이 향상됨을 관찰할 수 있었다.

α-feature map scaling for raw waveform speaker verification (α-특징 지도 스케일링을 이용한 원시파형 화자 인증)

  • Jung, Jee-weon;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.441-446
    • /
    • 2020
  • In this paper, we propose the α-Feature Map Scaling (α-FMS) method which extends the FMS method that was designed to enhance the discriminative power of feature maps of deep neural networks in Speaker Verification (SV) systems. The FMS derives a scale vector from a feature map and then adds or multiplies them to the features, or sequentially apply both operations. However, the FMS method not only uses an identical scale vector for both addition and multiplication, but also has a limitation that it can only add a value between zero and one in case of addition. In this study, to overcome these limitations, we propose α-FMS to add a trainable parameter α to the feature map element-wise, and then multiply a scale vector. We compare the performance of the two methods: the one where α is a scalar, and the other where it is a vector. Both α-FMS methods are applied after each residual block of the deep neural network. The proposed system using the α-FMS methods are trained using the RawNet2 and tested using the VoxCeleb1 evaluation set. The result demonstrates an equal error rate of 2.47 % and 2.31 % for the two α-FMS methods respectively.

CASA Based Approach to Estimate Acoustic Transfer Function Ratios (CASA 기반의 마이크간 전달함수 비 추정 알고리즘)

  • Shin, Minkyu;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.54-59
    • /
    • 2014
  • Identification of RTF (Relative Transfer Function) between sensors is essential to multichannel speech enhancement system. In this paper, we present an approach for estimating the relative transfer function of speech signal. This method adapts a CASA (Computational Auditory Scene Analysis) technique to the conventional OM-LSA (Optimally-Modified Log-Spectral Amplitude) based approach. Evaluation of the proposed approach is performed under simulated stationary and nonstationary WGN (White Gaussian Noise). Experimental results confirm advantages of the proposed approach.

Development of Korean Consonant Perception Test (자음지각검사 (KCPT)의 개발)

  • Kim, Jin-Sook;Shin, Eun-Yeong;Shin, Hyun-Wook;Lee, Ki-Do
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.5
    • /
    • pp.295-302
    • /
    • 2011
  • The purpose of this study was to develop Korean Consonant Perception Test (KCPT), that is a phonemic level including elementary data to evaluate speech and consonant perception ability of the normal and the hearing impaired qualitatively and quantitatively. KCPT was completed with meaningful monosyllabic words out of possible all Korean monosyllabic words, considering articulation characteristics, the degree of difficulty, and the frequency of the phonemic appearance, after assembling a tentative initial and final consonants testing items using four multiple-choice method which were applied to the seven final consonant regulation and controlled with the familiarity of the target words. Conclusively, the final three hundred items were developed including two- and one-hundred items for initial and final testing items, respectively, with the evaluation of the 20 normal hearing adults. Through this process, the final KCPT was composed upon the colloquial frequency following identification of no speakers' variances statistically and elimination of the highly difficult items. The 30 hearing impaired were tested with KCPT and found that the half lists, A and B, were not different statistically and the initial and final testing items were appropriate for evaluating initial and final consonants, respectively.