Search | Korea Science

A New Statistical Voice Activity Detector Based on UMP Test (UMP 테스트에 근거한 새로운 통계적 음성검출기)

Jang, Keun-Won;Chang, Joon-Hyuk;Kim, Dong-Kook
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.1
- /
- pp.16-24
- /
- 2007
Voice activity detectors (VADs) are important in wireless communication and speech signal processing. In the conventional VAD methods. an expression for the likelihood ratio test (LRT) based on statistical models is derived. Then, speech or noise is decided by comparing the value of the expression with a threshold. We propose a new method with the modified decision rule based on the Gaussian distribution and the uniformly most power (UMP) test. This method requires the distribution of the absolute value of the incoming speech signal. Then we can obtain the final decision through the relation between the Rayleigh distributions. This VAD method can detect speech without a priori signal-to-noise ratio (SNR) which is required in the conventional VAD algorithms. Additionally, in the various VAD performance tests, the proposed VAD method is shown to be more effective than the traditional scheme.
https://doi.org/10.7776/ASK.2007.26.1.016 인용 PDF KSCI

Analysis of Association Relationship Between A16 Acupuncture Point and Heart Function Using Voice Signals (음성신호를 이용한 A16 혈자리와 심장 기능의 연관관계 분석)

Kim, Bong-Hyun;Cho, Dong-Uk
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.11B
- /
- pp.1651-1658
- /
- 2010
As indicators of life quality have recently shown great improvement, early stage medical examination and health care patterns are usually preformed before diseases occur. Thus, hand acupuncture, as an alternative medicine to reflect these movements of preventative work and health care, is widely used these days. Therefore, in this paper, we measured the change of voice signals elements associated with heart by stimulating the heart A16 acupuncture point, and then we investigated possible improvements of cardiac function through analysis of cross-comparisons between measurements of cardiac changes. With this in mind, we collected voice samples associated with heart before and after stimulating the corresponding A16 acupuncture point, and we performed an experiment by applying the second formant bandwidth and Jitter. As result, stimulating the A16 acupuncture point results to lowering the second formant bandwidth and Jitter. The result has proven that using voice signal processing technology can help improvement of heart function.
PDF KSCI

Voice Activity Detection employing the Generalized Normal-Laplace Distribution (일반화된 정규-라플라스 분포를 이용한 음성검출기)

Kim, Sang-Kyun;Kwon, Jang-Woo;Lee, Sangmin
- Journal of Korea Multimedia Society
- /
- v.17 no.3
- /
- pp.294-299
- /
- 2014
In this paper, we propose a novel algorithm to improve the performance of a voice activity detection(VAD) which is based on the generalized normal-Laplace(GNL) distribution. In our algorithm, the probability density function(PDF) of the noisy speech signal is represented by the GNL distribution and the variance of the speech and noise of GNL distribution are estimated using higher order moments. Experimental results show that the proposed algorithm yields better results compared to the conventional VAD algorithms.
https://doi.org/10.9717/kmms.2014.17.3.294 인용 PDF KSCI KPUBS HTML

Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation

Kim, Hyoung-Gook;Kim, Jin Young
- ETRI Journal
- /
- v.38 no.3
- /
- pp.510-517
- /
- 2016
Recent developments in the field of separation of mixed signals into music/voice components have attracted the attention of many researchers. Recently, iterative kernel back-fitting, also known as kernel additive modeling, was proposed to achieve good results for music/voice separation. To obtain minimum mean square error (MMSE) estimates of short-time Fourier transforms of sources, generalized spatial Wiener filtering (GW) is typically used. In this paper, we propose an advanced music/voice separation method that utilizes a generalized weighted ${\beta}$-order MMSE estimation (WbE) based on iterative kernel back-fitting (KBF). In the proposed method, WbE is used for the step of mixed music signal separation, while KBF permits kernel spectrogram model fitting at each iteration. Experimental results show that the proposed method achieves better separation performance than GW and existing Bayesian estimators.
https://doi.org/10.4218/etrij.16.0115.0256 인용 PDF KSCI

Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds

Yoo, In-Chul;Yook, Dong-Suk
- ETRI Journal
- /
- v.31 no.4
- /
- pp.451-453
- /
- 2009
This letter proposes the use of vowel sound detection for voice activity detection. Vowels have distinctive spectral peaks. These are likely to remain higher than their surroundings even after severe corruption. Therefore, by developing a method of detecting the spectral peaks of vowel sounds in corrupted signals, voice activity can be detected as well even in low signal-to-noise ratio (SNR) conditions. Experimental results indicate that the proposed algorithm performs reliably under various noise and low SNR conditions. This method is suitable for mobile environments where the characteristics of noise may not be known in advance.
https://doi.org/10.4218/etrij.09.0209.0104 인용 PDF

Voice Color Conversion Based on the Formants and Spectrum Tilt Modification (포먼트 이동과 스펙트럼 기울기의 변환을 이용한 음색 변환)

Son Song-Young;Hahn Min-Soo
- MALSORI
- /
- no.45
- /
- pp.63-77
- /
- 2003
The purpose of voice color conversion is to change the speaker identity perceived from the speech signal. In this paper, we propose a new voice color conversion algorithm through the formant shifting and the spectrum-tilt modification in the frequency domain. The basic idea of this technique is to convert the positions of source formants into those of target speaker's formants through interpolation and decimation and to modify the spectrum-tilt by utilizing the information of both speakers' spectrum envelops. The LPC spectrum is adopted to evaluate the position of formant and the information of spectrum-tilt. Our algorithm enables us to convert the speaker identity rather successfully while maintaining good speech quality, since it modifies speech waveforms directly in the frequency domain.
PDF

Wearable Computing System for the bland persons (시각 장애우를 위한 Wearable Computing System)

Kim, Hyung-Ho;Choi, Sun-Hee;Jo, Tea-Jong;Kim, Soon-Ju;Jang, Jea-In
- Proceedings of the KIEE Conference
- /
- 2006.04a
- /
- pp.261-263
- /
- 2006
Nowadays, technologies such as RFID, sensor network makes our life comfortable more and more. In this paper we propose a wearable computing system for blind and deaf person who can be easily out of sight from our technology. We are making a wearable computing system that is consisted of embedded board to processing data, ultrasonic sensors to get distance data and motors that make vibration as a signal to see the screen for a deaf person. This system offers environmental informations by text and voice. For example, distance data from a obstacle to a person are calculated by data compounding module using sensed ultrasonic reflection time. This data is converted to text or voice by main processing module, and are serviced to a handicapped person. Furthermore we will extend this system using a voice recognition module and text to voice convertor module to help communication among the blind and deaf persons.
PDF

Boll's Spectral Subtraction Algorithm by New Voice Activity Detection (새로운 음성 활동 검출법에 의한 Boll의 스펙트럼 차감 알고리즘)

류종훈;김대경;박장식;손경식
- Journal of Korea Multimedia Society
- /
- v.4 no.1
- /
- pp.46-55
- /
- 2001
In this paper, a new voice activity detection method estimating SNR of enhanced speech with extended spectral subtraction (ESS) is proposed. Voice activity detection is performed by putting an second Wiener filter behind an Wiener filter used in the ESS to estimate speech and noise power of output signal of first Wiener filter. The proposed voice activity detection method does not require many computational loads and performs well under severe input SNR. Boll's spectral substraction algorithm with proposed voice activity detection was compared to ESS under several noise environment having different time-frequency distributions. During speech and non-speech activity, performance of Boll's spectral substraction algorithm with proposed voice activity detection is superior to that of ESS.
PDF

Artificial Intelligence for Clinical Research in Voice Disease (후두음성 질환에 대한 인공지능 연구)

Jungirl, Seok;Tack-Kyun, Kwon
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.33 no.3
- /
- pp.142-155
- /
- 2022
Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.
https://doi.org/10.22469/jkslp.2022.33.3.142 인용 PDF KSCI

Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy in Noisy Environments (잡음환경에서 Teager Energy 기반의 전역 음성부재확률을 이용하는 음성검출)

Park, Yun-Sik;Lee, Sang-Min
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.49 no.1
- /
- pp.97-103
- /
- 2012
In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. Global speech absence probability (GSAP) derived from likelihood ratio (LR) based on the statistical model is widely used as the feature parameter for VAD. However, the feature parameter based on conventional GSAP is not sufficient to distinguish speech from noise at low SNRs (signal-to-noise ratios). The presented VAD algorithm utilizes GSAP based on Teager energy (TE) as the feature parameter to provide the improved performance of decision for speech segments in noisy environment. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.
PDF KSCI

Search Result 431, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)