Search | Korea Science

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Multi speaker speech synthesis system (다화자 음성 합성 시스템)

Lee, Jun-Mo;Chang, Joon-Hyuk
- Proceedings of the Korea Information Processing Society Conference
- /
- 2018.05a
- /
- pp.338-339
- /
- 2018
본 논문은 스피커 임베딩을 이용한 다화자 음성 합성 시스템을 제안한다. 이 모델은 인공신경망을 기반으로 하는 당일화자 음성 합성 시스템인 타코트론을 기초로 구성된다. [1]. 제안 된 모델은 입력 데이터에 화자 임베딩을 추가 데이터로 항께 넣어주는 간단한 방식으로 구현되며 당일화자 모델에 비해 큰 성능 저하 없이 성공적으로 음성을 생성한다.
https://doi.org/10.3745/PKIPS.y2018m05a.338 인용 PDF

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

Kim, Kwang Hyeon;Kwon, Chul Hong
- The Journal of the Convergence on Culture Technology
- /
- v.8 no.3
- /
- pp.469-475
- /
- 2022
To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.
https://doi.org/10.17703/JCCT.2022.8.3.469 인용 PDF KSCI

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

Jo, Min Su;Kwon, Chul Hong
- The Journal of the Convergence on Culture Technology
- /
- v.7 no.4
- /
- pp.675-681
- /
- 2021
With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.
https://doi.org/10.17703/JCCT.2021.7.4.675 인용 PDF KSCI

Localization of Multiple Speakers Using Microphone Array System (마이크로폰 어레이 시스템을 이용한 다화자 방향검지)

Hung, Vu Viet;Lee, Chang-Hoon
- The Journal of Engineering Research
- /
- v.8 no.1
- /
- pp.59-65
- /
- 2006
본 논문에서는 마이크로폰 어레이 시스템을 이용하여 여러 화자의 음성 정보로부터 각 화자가 위치한 방향을 추정하는 기술 개발 내용을 다룬다. 성능 향상을 위한 전처리 과정으로 비선형 증폭기를 사용하여 거리에 따른 영향을 최소화하는 과정과 잡음에 대한 강인성을 얻기 위해 음성활성 영역을 검출하는 과정을 포함한다. 등간격으로 배치된 마이크로폰 어레이 시스템의 기하학적 특성에 따른 음원의 위치와 신호의 지연시간차이와의 상관관계로부터 화자의 위치를 역으로 추정하는 알고리즘을 기본으로 하여 가능성 척도를 계산하고 이를 활용하여 가능성이 높은 것들을 클러스터링하여 가능성이 있는 후보를 선정하여 화자의 방향을 검지한다. 이 과정에서 오인식을 최소화하기 위하여 가능성이 희박한 영역에 대한 추정 억제 방법으로 부정식 추론법을 적용하였다. 2 화자의 음성 신호를 입력으로 한 실험을 통하여 제안한 방법에 의한 다화자 방향검지의 가능성을 알아보았다.
PDF

The Slip-Wall Boundary Conditions Effects and the Entropy Characteristics of the Multi-Species GH Solver (다화학종 GH 방정식의 정확성 향상을 위한 벽면 경계조건 연구 및 GH 방정식의 엔트로피 특성 고찰)

Ahn, Jae-Wan;Kim, Chong-Am
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.37 no.10
- /
- pp.947-954
- /
- 2009
Starting from the Eu's GH(Generalized Hydrodynamic) theory, the multi-species GH numerical solver is developed in this research and its computatyional behaviors are examined for the hypersonic rarefied flow over an axisymmetric body. To improve the accuracy of the developed multi-species GH solver, various slip-wall boundary conditions are tested and the computed results are compared. Additionally, in order to validate the entropy characteristics of the GH equation, the entropy production and entropy generation rates of the GH equation are investigated in the 1-dimensional normal shock structure test at a high Knudsen number.
https://doi.org/10.5139/JKSAS.2009.37.10.947 인용 PDF KSCI

Effects of Cognitive Impairment on Self-reported Hearing Handicap in Older Adults with Early-stage Presbycusis (초기 노인성 난청자에서 인지장애가 일상생활 듣기 어려움에 미치는 영향)

Lee, Soo Jung
- 한국노년학
- /
- v.38 no.1
- /
- pp.1-14
- /
- 2018
Everyday hearing handicap caused by presbycusis ultimately reduces quality of life in older adults. The aim of this study was to explore effects of cognitive impairment on self-reported hearing handicap in older adults with early-stage presbycusis. We compared K-HHIE scores between 40 elderly subjects with mild cognitive impairment (MCI) and age- and hearing-threshold matched 40 cognitively normal elderly (CNE) subjects. The results are as follows: 1) The MCI group scored significantly higher than the CNE group on the social/situational and emotional sections, and in total. 2) The MCI group scored significantly higher than the CNE group on all four subscales, and the most significant group difference was on the first subscale relating to interpersonal relationships and social handicaps. 3) Both groups scored highest on the item 8 (problems hearing whispering sounds) and item 15 (problems hearing TV or radio sounds). Besides those two items, the MCI group also scored high on the item 21 (problems hearing in a restaurant), item 6 (problems hearing when attending a party), item 3 (avoiding groups of people), and item 20 (personal or social restrictions). Our findings suggest that, among older adults with early-stage presbycusis, older adults with cognitive impairment tend to report greater everyday hearing handicap than their peers with normal cognitive function. Especially, they show significant problems hearing in background noise or multi-talker situations, which cause social restrictions and social/emotional loneliness.

The Noise Effect on Stuttering and Overall Speech Rate: Multi-talker Babble Noise (다화자잡음이 말더듬의 비율과 말속도에 미치는 영향)

Park, Jin;Chung, In-Kie
- Phonetics and Speech Sciences
- /
- v.4 no.2
- /
- pp.121-126
- /
- 2012
This study deals with how stuttering changes in its frequency in a situation where adult participants who stutter are exposed to one type of background noise, that is, multi-talker babble noise. Eight American English-speaking adults who stutter participated in this study. Each of the subjects read aloud sentences under each of three speaking conditions (i.e., typical solo reading (TSR), typical choral reading (TCR), and multi-talker babble noise reading (BNR)). Speech fluency was computed based on a percentage of syllables stuttered (%SS) and speaking rate was also assessed to examine if there was significant change in rates as a measure of vocal change under each of the speaking conditions. The study found that participants read more fluently both during BNR and during TCR than during TSR. The study also found that participants did not show significant changes in speaking rate across the three speaking conditions. Some discussion was provided in relation to the effect of multi-talker babble noise on the frequency of stuttering and its further speculation.
https://doi.org/10.13064/KSSS.2012.4.2.121 인용 PDF

Lilium longiflorum 'Charm' as a F₁ Hybrid for Pot Plant (종자번식 일대잡종 분화용 나팔나리(Lilium longiflorum) 'Charm' 육성)

Song, Cheon Young
- FLOWER RESEARCH JOURNAL
- /
- v.16 no.4
- /
- pp.304-308
- /
- 2008
Lilium longiflorum 'Charm' as a $F_1$ hybrid cultivar was released by crossing inbred line '$L_2$-14' and '$L_2$-21' which were obtained from 5 self crosses originated from 'Nellie White', 'Ace' and 'Hinomoto'. The growth and flowering characteristics were evaluated in a greenhouse maintained at a minimum of $13^{\circ}C$ at night during winter in 2006 and 2007. The flower of 'Charm' is white color and horizontal-facing. The flower number of a plant and its diameter is 7.4 and 16.5 cm with 24.5 ornamental(flowering) days. The plant height is 60.3 cm with 70.3 number of leaves. The stem diameter and internode length is 1.2 cm and 1.1 cm, respectively, meaning the plant is compact and sturdy. And the number of seed per a capsule is 251.1. The results of these evaluation, therefore, suggest that seedling Lilium longiflorum 'Charm' can be used as a pot plant due to its short stems, many number of flowers per plant, long ornamental period, strong growth habit with many leaves and thick stem diameter.

Studies of Eri-Silk Cultring in Korea (한국피마잠사개발에 대한 연구)

최병희;김재두;박창준
- Journal of Sericultural and Entomological Science
- /
- v.9
- /
- pp.49-66
- /
- 1969
Eri-silkworm is known as a tropical insect where as poly-voltine type in that area. It eats caster oil plant leaves which are cultivated as an every year cultivatable seed oil use in this country, even though it grows for many years in tropical countries. That is why, farmers have freedom for its cultivation in any year if they want. Therefore, eri-silkworm rearing service is flexible for its diet procurment as wish of farmer. The eri-cocoon price or economical fluctuation may be reactable for the rearing work not like as mulberry cocoon. Fortunately, it also eats cynthia tree leaves. Standing from such a easy condition, the authors have studied about this problem since 1963 to develope a culturing method of eri-silkworm rearing in this country and the authors brought out the matters to be produced as an industry scale. Here, the authors summarized their works of the results covering with thirty three work tables. The obtained results are as follows.
PDF

Search Result 12, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)