Search | Korea Science

A New Statistical Voice Activity Detector Based on UMP Test (UMP 테스트에 근거한 새로운 통계적 음성검출기)

Jang, Keun-Won;Chang, Joon-Hyuk;Kim, Dong-Kook
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.1
- /
- pp.16-24
- /
- 2007
Voice activity detectors (VADs) are important in wireless communication and speech signal processing. In the conventional VAD methods. an expression for the likelihood ratio test (LRT) based on statistical models is derived. Then, speech or noise is decided by comparing the value of the expression with a threshold. We propose a new method with the modified decision rule based on the Gaussian distribution and the uniformly most power (UMP) test. This method requires the distribution of the absolute value of the incoming speech signal. Then we can obtain the final decision through the relation between the Rayleigh distributions. This VAD method can detect speech without a priori signal-to-noise ratio (SNR) which is required in the conventional VAD algorithms. Additionally, in the various VAD performance tests, the proposed VAD method is shown to be more effective than the traditional scheme.
https://doi.org/10.7776/ASK.2007.26.1.016 인용 PDF KSCI

Voice Activity Detection employing the Generalized Normal-Laplace Distribution (일반화된 정규-라플라스 분포를 이용한 음성검출기)

Kim, Sang-Kyun;Kwon, Jang-Woo;Lee, Sangmin
- Journal of Korea Multimedia Society
- /
- v.17 no.3
- /
- pp.294-299
- /
- 2014
In this paper, we propose a novel algorithm to improve the performance of a voice activity detection(VAD) which is based on the generalized normal-Laplace(GNL) distribution. In our algorithm, the probability density function(PDF) of the noisy speech signal is represented by the GNL distribution and the variance of the speech and noise of GNL distribution are estimated using higher order moments. Experimental results show that the proposed algorithm yields better results compared to the conventional VAD algorithms.
https://doi.org/10.9717/kmms.2014.17.3.294 인용 PDF KSCI KPUBS HTML

Comparison of Two Speech Estimation Algorithms Based on Generalized-Gamma Distribution Applied to Speech Recognition in Car Noisy Environment (자동차 잡음환경에서의 음성인식에 적용된 두 종류의 일반화된 감마분포 기반의 음성추정 알고리즘 비교)

Kim, Hyoung-Gook;Lee, Jin-Ho
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.8 no.4
- /
- pp.28-32
- /
- 2009
This paper compares two speech estimators under a generalized Gamma distribution for DFT-based single-microphone speech enhancement methods. For the speech enhancement, the noise estimation based on recursive averaging spectral values by spectral minimum noise is applied to two speech estimators based on the generalized Gamma distribution using $\kappa$=1 or $\kappa$=2. The performance of two speech enhancement algorithms is measured by recognition accuracy of automatic speech recognition(ASR) in car noisy environment.
PDF

On Detecting the Steady State Segments of Phonemes by Using the Magnitude Distribution of Speech Waveforms (음성파형의 진폭분포를 이용한 음소의 정상상태 구간 검출)

정덕조;배명진;안수길
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.6
- /
- pp.5-11
- /
- 1991
연속음 인식을 위하여 연결된 음향 신호를 음소단위로 분할하는 것이 필요하다. 본 논문에서는 연속 음성에서의 정상상태 구간 검출을 위한 파라미터로서 진폭분포를 이용하는 방법을 제안하였다. 제 안된 진폭분포는 음성신호의 변화특성을 정확히 나타내며 이러한 프레임사이의 진폭분포를 이용하는 방 법을 제안하였다. 제안된 지폭분포는 음성 신호의 변화특성을 정확히 나타내며 이러한 프레임사이의 진 폭 분포 차이값을 비교하여 프레임의 안정구간과 천이구간을 구분할 수 있었다.
PDF

Speech Estimators Based on Generalized Gamma Distribution and Spectral Gain Floor Applied to an Automatic Speech Recognition (잡음에 강인한 음성인식을 위한 Generalized Gamma 분포기반과 Spectral Gain Floor를 결합한 음성향상기법)

Kim, Hyoung-Gook;Shin, Dong;Lee, Jin-Ho
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.8 no.3
- /
- pp.64-70
- /
- 2009
This paper presents a speech enhancement technique based on generalized Gamma distribution in order to obtain robust speech recognition performance. For robust speech enhancement, the noise estimation based on a spectral noise floor controled recursive averaging spectral values is applied to speech estimation under the generalized Gamma distribution and spectral gain floor. The proposed speech enhancement technique is based on spectral component, spectral amplitude, and log spectral amplitude. The performance of three different methods is measured by recognition accuracy of automatic speech recognition (ASR).
PDF

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

Oh, Sang-Yeob
- Journal of Digital Convergence
- /
- v.13 no.10
- /
- pp.313-318
- /
- 2015
A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.
https://doi.org/10.14400/JDC.2015.13.10.313 인용 PDF KSCI

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

Chung, Kyungyong;Oh, SangYeob
- Journal of Digital Convergence
- /
- v.16 no.11
- /
- pp.335-340
- /
- 2018
Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.
https://doi.org/10.14400/JDC.2018.16.11.335 인용 PDF KSCI HTML

Investigating the Relationship Between Vehicle Front Images and Voice Assistants (자동차 전면부와 음성 어시스턴트의 스타일 관계 분석)

Min-Jung Park;So-Yeong Min;Tae-Su Kim;Hyeon-Jeong Suk
- Science of Emotion and Sensibility
- /
- v.25 no.4
- /
- pp.129-138
- /
- 2022
In the context of the increasing applications of voice assistants in vehicles, we focused on the association between the visual appeal of the cars and the acoustic characteristics of the voice assistants. This study aimed to investigate the relationship between the visual appeal of the vehicle and the voice assistant based on their emotional characteristics. A total of 15 adjectives were used to assess the emotional characteristics of 12 types of cars and six types of voices. An online interview was carried out, instructing participants to match three adjectives with the presented car images or voices. This was followed with a brief interview to allow the participants to reflect on the adjective matches. Based on the assessments, we performed principal component analysis (PCA) to determine factors. We aimed to deploy the cars and voices and analyze the patterns of clustering. The PCA analysis revealed two factors profiled as "Light-Heavy" and "Comfortable-Radical." Both car and voice stimuli were deployed in a two-dimensional space showing the internal relationship within and between the two substances. Based on the coordination data, a hierarchical cluster grouped the 18 stimuli into four groups labeled as challenge, elegance, majesty, and vigor. This study identified two latent factors describing the emotional characteristics of both car images and voice types clustered into four groups based on their emotional characteristics. The coherent matches between car style and voice type are expected to address the design concept more successfully.
https://doi.org/10.14695/KJSOS.2022.25.4.129 인용 PDF KSCI

On the Use of a Parallel-Branch Subunit Mod디 in Continuous HMM for improved Word Recognition (연속분포 HMM에서 평행분기 음성단위를 사용한 단어인식율 향상연구)

Park, Yong-Kyuo;Un, Chong-Kwan
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.2E
- /
- pp.25-32
- /
- 1995
In this paper, we propose to use a parallel-branch subunit model for improved word recognition. The model is obtained by splitting off each subunit branch based on mixture component in continuous hidden Markov model(continuous HMM). According to simulation results, the proposed model yields higher recognition rate than the single-branch subunit model or the parallel-branch subunit model proposed by Rabiner et al[1]. We show that a proper combination of the number of mixture components and the number of branches for each subunit results in increased recognition rate. To study the recognition performance of the proposed algorithms, the speech material used in this work was a vocabulary with 1036 Korean words.
PDF

Auto-Segmentation of Unsegmented Speech based on HMM and Time-Synchronous Viterbi Algorithm (시간동기형 Viterbi 알고리즘과 HMM에 기반한 음성의 자동 세그멘테이션)

오세진;황철준;김범국;정호열;정현열
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04b
- /
- pp.592-594
- /
- 2001
본 연구에서는 음성인식에 있어서 음향모델의 고정도화를 위해 통계적 방법인 HMM과 시간동기형 Viterbi 알고리즘을 기반으로 한 세그멘트되지 않은 음성의 자동 세그멘테이션에 관한 연구를 수행하였다. 본 연구에서는 소량의 세그멘트된 음성에 대해 연속분포형 HMM 기본모델을 작성한 후 이를 표준패턴으로 사용하고, 세그멘트되지 않은 입력음성의 특징 피라미터에 대해 시간동기형 Viterbi 알고리즘의 프레임마다 최대가 되는 지점을 최적경계로 설정하고, 앞에서 구현 최적 경계 정보와 언어학적 지식인 발음사전 정보를 이용하여 음성을 세그멘테이션 하는 것이다. 본 연구와의 비교를 위해 HTK를 이용하여 위와 동일한 과정을 수행하였다. 이렇게 구한 음성의 세그멘테이션 정보를 이용하여 연속분포형 HMM 기본모델과 HTK의 CHMM 기본모델을 각각 작성한 후, 국어공학센터(KLE) 단어 데이터에 대해 단어인식 성능을 평가하였다. 실험결과, KLE 452 남성과 여성에 대해, 본 연구실 인식 시스템은 화자독립 단어인식률 89.4%, 85.1%, HTK의 화자독립 단어인식률 85.1%, 81.9%를 각각 얻었다.
PDF

Search Result 412, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)