통합 검색 | Korea Science

마이크로폰 배열에서 독립벡터분석 기법을 이용한 잡음음성의 음질 개선 (Microphone Array Based Speech Enhancement Using Independent Vector Analysis)

왕씽양;전성일;배건성
- 말소리와 음성과학
- /
- 제4권4호
- /
- pp.87-92
- /
- 2012
Speech enhancement aims to improve speech quality by removing background noise from noisy speech. Independent vector analysis is a type of frequency-domain independent component analysis method that is known to be free from the frequency bin permutation problem in the process of blind source separation from multi-channel inputs. This paper proposed a new method of microphone array based speech enhancement that combines independent vector analysis and beamforming techniques. Independent vector analysis is used to separate speech and noise components from multi-channel noisy speech, and delay-sum beamforming is used to determine the enhanced speech among the separated signals. To verify the effectiveness of the proposed method, experiments for computer simulated multi-channel noisy speech with various signal-to-noise ratios were carried out, and both PESQ and output signal-to-noise ratio were obtained as objective speech quality measures. Experimental results have shown that the proposed method is superior to the conventional microphone array based noise removal approach like GSC beamforming in the speech enhancement.
https://doi.org/10.13064/KSSS.2012.4.4.087 인용 PDF

Perception of Korean stops with a three-way laryngeal contrast

Kong, Eun-Jong
- 말소리와 음성과학
- /
- 제4권1호
- /
- pp.13-20
- /
- 2012
A lax stop in Korean, one of the three laryngeal contrastive stops, has undergone a sound change in terms of its acoustic properties. Prior production studies described this recent lax stop as being differentiated from tense and aspirated stops primarily by fundamental frequencies (f0). And, the acoustic property of voice onset time (VOT) further separates tense stops from lax and aspirated stops. The current research explores how these two major acoustic parameters of f0 and VOT cue the three stop categories in Korean adult listeners' perception. Thirty-one native speakers of Korean participated in two experimental tasks: categorization judgment and within-category goodness ratings. Two sets of audio stimuli were prepared by synthesizing English and Korean male speakers' CV productions. The findings showed that while f0 cues listeners to lax stops as production patterns would predict, VOT were closely related to listeners' categorization and goodness ratings of lax stops. This suggests that accurate characterizations of the recent lax stop category need to be based on Korean speakers' perceptual behavior as well as production patterns.
https://doi.org/10.13064/KSSS.2012.4.1.013 인용 PDF

TBE 모델을 사용하는 HMM 기반 음성합성기 성능 향상을 위한 하모닉 선택에 기반한 MVF 예측 방법 (Harmonic Peak Picking-based MVF Estimation for Improvement of HMM-based Speech Synthesis System Using TBE Model)

박지훈;한민수
- 말소리와 음성과학
- /
- 제4권4호
- /
- pp.79-86
- /
- 2012
In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.
https://doi.org/10.13064/KSSS.2012.4.4.079 인용 PDF

HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석 (Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech)

임창송;배건성
- 말소리와 음성과학
- /
- 제2권1호
- /
- pp.71-75
- /
- 2010
The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.
PDF

DOA 기반 학습률 조절을 이용한 다채널 음성개선 알고리즘 (Multi-Channel Speech Enhancement Algorithm Using DOA-based Learning Rate Control)

김수환;이영재;김영일;정상배
- 말소리와 음성과학
- /
- 제3권3호
- /
- pp.91-98
- /
- 2011
In this paper, a multi-channel speech enhancement method using the linearly constrained minimum variance (LCMV) algorithm and a variable learning rate control is proposed. To control the learning rate for adaptive filters of the LCMV algorithm, the direction of arrival (DOA) is measured for each short-time input signal and the likelihood function of the target speech presence is estimated to control the filter learning rate. Using the likelihood measure, the learning rate is increased during the pure noise interval and decreased during the target speech interval. To optimize the parameter of the mapping function between the likelihood value and the corresponding learning rate, an exhaustive search is performed using the Bark's scale distortion (BSD) as the performance index. Experimental results show that the proposed algorithm outperforms the conventional LCMV with fixed learning rate in the BSD by around 1.5 dB.
PDF

감정 인식을 위한 음성 특징 도출 (Extraction of Speech Features for Emotion Recognition)

권철홍;송승규;김종열;김근호;장준수
- 말소리와 음성과학
- /
- 제4권2호
- /
- pp.73-78
- /
- 2012
Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.
https://doi.org/10.13064/KSSS.2012.4.2.073 인용 PDF

레벤스타인 거리에 기초한 위치 정확도를 이용한 고립 단어 인식 결과의 비유사 후보 단어 제외 (Exclusion of Non-similar Candidates using Positional Accuracy based on Levenstein Distance from N-best Recognition Results of Isolated Word Recognition)

윤영선;강점자
- 말소리와 음성과학
- /
- 제1권3호
- /
- pp.109-115
- /
- 2009
Many isolated word recognition systems may generate non-similar words for recognition candidates because they use only acoustic information. In this paper, we investigate several techniques which can exclude non-similar words from N-best candidate words by applying Levenstein distance measure. At first, word distance method based on phone and syllable distances are considered. These methods use just Levenstein distance on phones or double Levenstein distance algorithm on syllables of candidates. Next, word similarity approaches are presented that they use characters' position information of word candidates. Each character's position is labeled to inserted, deleted, and correct position after alignment between source and target string. The word similarities are obtained from characters' positional probabilities which mean the frequency ratio of the same characters' observations on the position. From experimental results, we can find that the proposed methods are effective for removing non-similar words without loss of system performance from the N-best recognition candidates of the systems.
PDF

교환학생프로그램 참가자들의 영어발음에 관한 연구 (A Study of the English Pronunciation of Korean Exchange Students)

박희석
- 말소리와 음성과학
- /
- 제1권3호
- /
- pp.87-93
- /
- 2009
The purpose of this experimental study is to investigate and compare the vowel lengths of English diphthongs and low vowels among native-English-speaking Americans and Korean college exchange students. To do this eight words and sixteen sentences were uttered and recorded by nine subjects, five Korean subjects and four American subjects. Results showed that the vowel lengths of English low vowels between American subjects and Korean subjects were different, which may lead to foreign accent of Korean speakers. Comparing the average length of English low vowels of Korean subjects with those of American subjects, we can see that American subjects tend to pronounce the English low vowels longer than Korean subjects do. In the pronunciation of diphthongs /eI/ and /ou/, Korean subjects pronounced longer than American subjects did. However, in the pronunciation of diphthongs /au/, /aI/, and /ɔI/, American subjects pronounced longer than Korean subjects did.
PDF

확률적 목표 음성 검출을 통한 다채널 입력 기반 음성개선 (Probabilistic Target Speech Detection and Its Application to Multi-Input-Based Speech Enhancement)

이영재;김수환;한승호;한민수;김영일;정상배
- 말소리와 음성과학
- /
- 제1권3호
- /
- pp.95-102
- /
- 2009
In this paper, an efficient target speech detection algorithm is proposed for the performance improvement of multi-input speech enhancement. Using the normalized cross correlation value between two selected channels, the proposed algorithm estimates the probabilistic distribution function of the value from the pure noise interval. Then, log-likelihoods are calculated with the function and the normalized cross correlation value to detect the target speech interval precisely. The detection results are applied to the generalized sidelobe canceller-based algorithm. Experimental results show that the proposed algorithm significantly improves the speech recognition performance and the signal-to-noise ratios.
PDF

정신피로와 음성특징과의 상관관계 측정 (Measuring Correlation between Mental Fatigues and Speech Features)

김정인;권철홍
- 말소리와 음성과학
- /
- 제6권2호
- /
- pp.3-8
- /
- 2014
This paper deals with how mental fatigue has an effect on human voice. For this a monotonous task to increase the feeling of the fatigue and a set of subjective questionnaire for rating the fatigue were designed. From the experiments the designed task was proven to be monotonous based on the results of the questionnaire responses. To investigate a statistical relationship between speech features extracted from the collected speech data and fatigue, the T test for two-related-samples was used. Statistical analysis shows that speech parameters deeply related to the fatigue are the first formant bandwidth, Jitter, H1-H2, cepstral peak prominence, and harmonics-to-noise ratio. According to the experimental results, it can be seen that voice is changed to be breathy as mental fatigue proceeds.
https://doi.org/10.13064/KSSS.2014.6.2.003 인용 PDF KSCI

검색결과 89건 처리시간 0.02초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)