통합 검색 | Korea Science

x-vector를 이용한 다화자 음성합성 시스템 (A Multi-speaker Speech Synthesis System Using X-vector)

조민수;권철홍
- 문화기술의 융합
- /
- 제7권4호
- /
- pp.675-681
- /
- 2021
최근 인공지능 스피커 시장이 성장하면서 사용자와 자연스러운 대화가 가능한 음성합성 기술에 대한 수요가 증가하고 있다. 따라서 다양한 음색의 목소리를 생성할 수 있는 다화자 음성합성 시스템이 필요하다. 자연스러운 음성을 합성하기 위해서는 대용량의 고품질 음성 DB로 학습하는 것이 요구된다. 그러나 많은 화자가 발화한 고품질의 대용량 음성 DB를 수집하는 것은 녹음 시간과 비용 측면에서 매우 어려운 일이다. 따라서 각 화자별로는 소량의 학습 데이터이지만 매우 많은 화자의 음성 DB를 사용하여 음성합성 시스템을 학습하고, 이로부터 다화자의 음색과 운율 등을 자연스럽게 표현하는 기술이 필요하다. 본 논문에서는 화자인식 기술에서 사용하는 딥러닝 기반 x-vector 기법을 적용하여 화자 인코더를 구성하고, 화자 인코더를 통해 소량의 데이터로 새로운 화자의 음색을 합성하는 기술을 제안한다. 다화자 음성합성 시스템에서 텍스트 입력에서 멜-스펙트로그램을 합성하는 모듈은 Tacotron2로, 합성음을 생성하는 보코더는 로지스틱 혼합 분포가 적용된 WaveNet으로 구성되어 있다. 학습된 화자 임베딩 신경망에서 추출한 x-vector를 Tacotron2에 입력으로 추가하여 원하는 화자의 음색을 표현한다.
https://doi.org/10.17703/JCCT.2021.7.4.675 인용 PDF KSCI

과학수사를 위한 한국인 음성 특화 자동화자식별시스템 (Forensic Automatic Speaker Identification System for Korean Speakers)

김경화;소병민;유하진
- 말소리와 음성과학
- /
- 제4권3호
- /
- pp.95-101
- /
- 2012
In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.
https://doi.org/10.13064/KSSS.2012.4.3.095 인용 PDF

HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선 (Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis)

이혜민;김형순
- 말소리와 음성과학
- /
- 제4권3호
- /
- pp.111-117
- /
- 2012
In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.
https://doi.org/10.13064/KSSS.2012.4.3.111 인용 PDF

DSP Processor(TMS320C32)를 이용한 화자인증 보안시스템의 구현 (Implementation of Speaker Verification Security System Using DSP Processor(TMS320C32))

함영준;권혁재;최수영;정익주
- 산업기술연구
- /
- 제21권B호
- /
- pp.107-116
- /
- 2001
The speech includes various kinds of information : language information, speaker's information, affectivity, hygienic condition, utterance environment etc. when a person communicates with others. All technologies to utilize in real life processing this speech are called the speech technology. The speech technology contains speaker's information that among them and it includes a speech which is known as a speaker recognition. DTW(Dynamic Time Warping) is the speaker recognition technology that seeks the pattern of standard speech signal and the similarity degree in an inputted speech signal using dynamic programming. ln this study, using TMS320C32 DSP processor, we are to embody this DTW and to construct a security system.
PDF

時間平均 롤러그래픽 干涉計에 의한 平面스피이커의 改善에 관한 硏究 (A study for the Improvement of the Plane Speaker by Time-Averaged Holographic Interferometry)

이기백;김수광;안경면;이병훈
- 대한기계학회논문집
- /
- 제10권3호
- /
- pp.285-291
- /
- 1986
본 연구에서는 평면스피이커의 진동판의 진동을 이론적으로 해석하고 시간 평균 홀러그래픽 간섭계로 조사하여, 지지대의 지지위치 선정과 피스톤 진동 모우드를 넓히는 방법을 제시하고, 비틀림이 적은 진동판을 설계하는 방법을 제시하여 평면스피 이커의 성능을 항상시키는 데 있다.
https://doi.org/10.22634/KSME.1986.10.3.285 인용 PDF

다구찌법을 이용한 마이크로 스피커용 다이아프램의 성능개선에 관한 연구 (A Study on the Improving Diaphgram for Micro Speaker Performance using Taguchi Method)

홍도관;우병철;안찬우
- 대한기계학회:학술대회논문집
- /
- 대한기계학회 2004년도 추계학술대회
- /
- pp.534-538
- /
- 2004
On this study, we improved diaphgram for micro speaker performance using Taguchi method in discrete design space. The design of diaphgram has an effect on performance of micro speaker such as, thickness of diaphgram, shape of diaphgram, etc. Therefore this study carried to decide shape of diaphgram and thickness of diaphgram for minimizing 2nd natural frequency of diaphgram using Taguchi method. we showed improved design factors that minimized 2nd natural frequency of diaphgram. Also, 2nd natural frequency of diaphgram for micro speaker is reduced up to 37 percent maintaining twist mode shape. From the results of ANOVA, 2nd natural frequency of diaphgram for micro speaker have an effect on position of the outer curved shape and thickness of diaphgram.
PDF

화자 인식을 위한 모음의 포만트 연구 (A Study on Formants of Vowels for Speaker Recognition)

안병섭;신지영;강선미
- 대한음성학회지:말소리
- /
- 제51호
- /
- pp.1-16
- /
- 2004
The aim of this paper is to analyze vowels in voice imitation and disguised voice, and to find the invariable phonetic features of the speaker. In this paper we examined the formants of monophthongs /a, u, i, o, {$\omega},{\;}{\varepsilon},{\;}{\Lambda}$/. The results of the present are as follows : $\circled1$ Speakers change their vocal tract features. $\circled2$ Vowels /a, ${\varepsilon}$, i/ appear to be proper for speaker recognition since they show invariable acoustic feature during voice modulation. $\circled3$ F1 does not change easily compared to higher formants. $\circled4$ F3-F2 appears to be constituent for a speaker identification in vowel /a/ and /$\varepsilon$/, and F4-F2 in vowel /i/. $\circled5$ Resulting of F-ratio, differences of each formants were more useful than individual formant of a vowel to speaker recognition.
PDF

최적화된 관측 신뢰도와 변형된 HMM 디코더를 이용한 잡음에 강인한 화자식별 시스템 (A Robust Speaker Identification Using Optimized Confidence and Modified HMM Decoder)

;김진영;나승유
- 대한음성학회지:말소리
- /
- 제64호
- /
- pp.121-135
- /
- 2007
Speech signal is distorted by channel characteristics or additive noise and then the performances of speaker or speech recognition are severely degraded. To cope with the noise problem, we propose a modified HMM decoder algorithm using SNR-based observation confidence, which was successfully applied for GMM in speaker identification task. The modification is done by weighting observation probabilities with reliability values obtained from SNR. Also, we apply PSO (particle swarm optimization) method to the confidence function for maximizing the speaker identification performance. To evaluate our proposed method, we used the ETRI database for speaker recognition. The experimental results showed that the performance was definitely enhanced with the modified HMM decoder algorithm.
PDF

직교배열표를 이용한 마이크로 스피커용 다이아프램의 성능개선에 관한 연구 (A Study on Performance Improvement of Diaphgram for Micro Speaker using Table of Orthogonal Array)

홍도관;우병철;안찬우
- 한국정밀공학회:학술대회논문집
- /
- 한국정밀공학회 2004년도 추계학술대회 논문집
- /
- pp.162-165
- /
- 2004
On this study, we improved diaphgram for micro speaker performance using Taguchi method in discrete design space. The design of diaphgram has an effect on performance of micro speaker such as, thickness of diaphgram, shape of diaphgram, etc. Therefore this study carried to decide shape of diaphgram and thickness of diaphgram for minimizing 2nd natural frequency of diaphgram using Taguchi method. we showed improved design factors that minimized 2nd natural frequency of diaphgram. Also, 2nd natural frequency of diaphgram for micro speaker is reduced up to 37 percent maintaining twist mode shape. From the results of ANOVA, 2nd natural frequency of diaphgram for micro speaker have an effect on position of the outer curved shape and thickness of diaphgram.
PDF

MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선 (Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training)

김태진;최재길;권철홍
- 대한음성학회지:말소리
- /
- 제57호
- /
- pp.165-174
- /
- 2006
In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).
PDF

검색결과 1,684건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)