Search | Korea Science

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
- Phonetics and Speech Sciences
- /
- v.2 no.3
- /
- pp.141-148
- /
- 2010
In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.
PDF

Aerodynamic Study in Korean Western Classical Singers (서양음악을 전공으로 하는 성악인에서의 공기역학적 검사)

정성민
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.9 no.2
- /
- pp.109-114
- /
- 1998
Background and Objectives : Aerodynamic investigation is valuable information about the efficiency of the larynx in translating air pressure to acoustic signal. The normal data of the Korean has been reported, but there is no basic data of professional western classical singers who have learned how to control the flow of expiratory air for singing. The purpose of this study was to investigate the normal aerodynamic data of korean professional western classical singers and compare this with that of the Korean Materials and Methods : 50 Korean western classical singers were studied. Expiratory lung pressure combined with measurements of the mean air flow rate, voice frequency and intensity were measured with the aerodynamic test using airway interruption method. These data were compared with normal data of untrained normal adults. Results and Conclusions : The voice frequency and the voice intensity were increased in the western classic singers, but the mean air flow rate and the expiratory air pressure of the classical singers were within the same range of the untrained normal adults. This result means that western classical singers can change the loudness and pitch with a little increased or decreasing the mean air flow and the expiratory air pressure.
PDF

Voice Conversion Using Linear Multivariate Regression Model and LP-PSOLA Synthesis Method (선형다변회귀모델과 LP-PSOLA 합성방식을 이용한 음성변환)

권홍석;배건성
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.15-23
- /
- 2001
This paper presents a voice conversion technique that modifies the utterance of a source speaker as if it were spoken by a target speaker. Feature parameter conversion methods to perform the transformation of vocal tract and prosodic characteristics between the source and target speakers are described. The transformation of vocal tract characteristics is achieved by modifying the LPC cepstral coefficients using Linear Multivariate Regression (LMR). Prosodic transformation is done by changing the average pitch period between speakers, and it is applied to the residual signal using the LP-PSOLA scheme. Experimental results show that transformed speech by LMR and LP-PSOLA synthesis method contains much characteristics of the target speaker.
PDF

VoIP Receiver Structure for Enhancing Speech Quality Based on Telematics (텔레메틱스 기반의 VoIP 음성 통화품질 향상을 위한 수신단 구조)

Kim, Hyoung-Gook;Seo, Kwang-Duk
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.11 no.3
- /
- pp.48-54
- /
- 2012
The quality of real-time voice communication over Internet Protocol networks based on telematics is affected by network impairments such as delays, jitters, and packet loss. To resolve this issue, this paper proposes a receiver-based enhancing method of VoIP speech quality. The proposed method enables users to deliver high-quality voice using playout control and signal reconstruction, which consists of concealment of lost packets, adaptive playout-buffer scheduling using active jitter estimation, and smooth interpolation between two signals in a transition region. The proposed algorithm achieves higher Perceptual Evaluation of Speech Quality (PESQ) values and low buffering delay than the reference algorithm.
https://doi.org/10.12815/kits.2012.11.3.048 인용 PDF KSCI

The VoIP System on Chip Design and the Test Board Development for the Function Verification (VoIP 시스템 칩 설계 및 기능 검증용 보드 개발)

소운섭;황대환;김대영
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2003.10a
- /
- pp.990-994
- /
- 2003
This paper describes the VoIP(Voice over Internet Protocol) SoC(System on Chip) Design and the test board development for the function verification to support voice communication services using Internet. To implement the simple system of configuration, we designed the VoIP SoC which have ARM922T of 32bit microprocessor, IP network interface, voice signal interface, various user interface function. Also we developed test program and communication protocol to verify the function of this chip. We used several tools of design and simulation, developed and tested a test board with Excalibur which includes ARM922T microprocessor and FPGA.
PDF

Vocal Exercise System Using Electroglottography (성문전도를 이용한 발성훈련 시스템)

Lee, Je-Hyun;Kim, Ji-Hye;Kang, Gu-Tae;Jung, Dong-Keun
- Journal of Sensor Science and Technology
- /
- v.22 no.2
- /
- pp.156-161
- /
- 2013
This study was aimed to implement the electroglottography (EGG) system for analyzing fundamental frequency of the phonation. EGG was recorded from the conductance between ring electrodes attached to the neck skin area near thyroid cartilage with high frequency carrier electric signals during vocalization, and voice signal was recorded with microphone simultaneously. EGG and voice signals were transmitted to the audio port in PC and recorded with stereo sound recording program. From the digitized data, several parameters such as pitch, jitter, shimmer, CQ and SQ were analyzed from the vowel sounds. For the voice training, sound fundamental frequency was displayed during the vocalization and singing a song using pitches analyzed from the EGG. The system implemented in this study could be used for vocal exercise.
https://doi.org/10.5369/JSST.2013.22.2.156 인용 PDF KSCI

A Study on the underwater communication system of ultrasonic transducer (압전 초음파 센서를 이용한 수중통신에 관한 연구)

Kim, Dong-Hyun;Woo, Hyoung-Gwan;Hwang, Hyun-Suk;Jin, Hong-Bum;Song, Joon-Tae
- Proceedings of the KIEE Conference
- /
- 2000.07c
- /
- pp.1658-1660
- /
- 2000
Simple signs were usually exchanged as the means of underwater communications. As people recently, need more informations for underwater activities, necessities of underwater communication systems exchanging hunman voice are increased. The purpose of this paper is understanding the ordinary characteristics of underwater communication and investigating the necessary conditions for a good underwater communication system by making a basic communication module. The experiment is achieved by applying AM (Amplitude Modulation) which is mainly used for the underwater communication systems and using common ultrasonic transducers. Ultrasonic transducers usually have narrow bandwidth for transducing electrical energy to mechanical energy. For improvement of sound reconstruction, transducers need more bandwidth which covers voice's frequency range, and goof linearity characteristics in this frequency range. As underwater transmissions have many factors to distort signals. Amplitude Modulation is not a proper way for underwater communications. Using digital signal by sampling human voice should be a good way for this systems, because digital communication simplify transmitting signals.
PDF

A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting (음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구)

Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.2
- /
- pp.85-95
- /
- 2003
In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.
PDF

Perturbation and Perceptual Analysis of Pathological Sustained Vowels according to Signal Typing

Lee, Ji-Yeoun;Choi, Seong-Hee;Jiang, Jack J.;Hahn, Min-Soo;Choi, Hong-Shik
- Phonetics and Speech Sciences
- /
- v.2 no.2
- /
- pp.109-115
- /
- 2010
In this paper, we investigate a signal typing on the basis of visual impression of distinctive spectrogram. Pathological voices are classified into signal type 1, 2, 3, or 4 to estimate perturbation parameters and to mark perceptual rating based on Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). The results suggest that perturbation analysis can be applied to only type 1 and 2 signals and the perceptual ratings of overall grade increase with each signal type, overall. A good inter-rater reliability is showed among three raters. We recommend that pathological voices should be marked the signal typing and CAPE-V, together, to definitely describe the characteristics of pathological voices.
PDF

Polyphase Representation of the Relationships Among Fullband, Subband, and Block Adaptive Filters

Tsai, Chimin
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.1435-1438
- /
- 2005
In hands-free telephone systems, the received speech signal is fed back to the microphone and constitutes the so-called echo. To cancel the effect of this time-varying echo path, it is necessary to device an adaptive filter between the receiving and the transmitting ends. For a typical FIR realization, the length of the fullband adaptive filter results in high computational complexity and low convergence rate. Consequently, subband adaptive filtering schemes have been proposed to improve the performance. In this work, we use deterministic approach to analyze the relationship between fullband and subband adaptive filtering structures. With block adaptive filtering structure as an intermediate stage, the analysis is divided into two parts. First, to avoid aliasing, it is found that the matrix of block adaptive filters is in the form of pseudocirculant, and the elements of this matrix are the polyphase components of the fullband adaptive filter. Second, to transmit the near-end voice signal faithfully, the analysis and the synthesis filter banks in the subband adaptive filtering structure must form a perfect reconstruction pair. Using polyphase representation, the relationship between the block and the subband adaptive filters is derived.
PDF

Search Result 436, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)