Search | Korea Science

Voice transformation for HTS using correlation between fundamental frequency and vocal tract length (기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환)

Yoo, Hyogeun;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
- Phonetics and Speech Sciences
- /
- v.9 no.1
- /
- pp.41-47
- /
- 2017
The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.
https://doi.org/10.13064/KSSS.2017.9.1.041 인용 PDF KSCI

Development of a 3-D Visualization Application for Management of Substation Equipment

Park, Chang-Hyun
- Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
- /
- v.23 no.3
- /
- pp.38-44
- /
- 2009
This paper presents a new windows application based on 3-D graphics and Text-To-Speech (TTS) for effective management of substation equipment. When problems in a power system occur, inexperienced power system operators may have difficulty in understanding the situation as well as finding suitable countermeasures quickly. This paper addresses an effective scheme to visualizing power system equipment under normal and abnormal conditions using 3-D graphics and animations. In addition, the state variations and the order of maintenance priority of substation equipment are represented by TTS and intuitive methods. The proposed system can help power system operators to more quickly understand the state of power system equipment, and it can provide operators with the suitable countermeasures for minimizing damage caused by equipment problems.
https://doi.org/10.5207/JIEIE.2009.23.3.038 인용 PDF KSCI

A Unit Selection Methods using Variable Break in a Japanese TTS (일본어 TTS의 가변 Break를 이용한 합성단위 선택 방법)

Na, Deok-Su;Bae, Myung-Jin
- Proceedings of the IEEK Conference
- /
- 2008.06a
- /
- pp.983-984
- /
- 2008
This paper proposes a variable break that can offset prediction error as well as a pre-selection methods, based on the variable break, for enhanced unit selection. In Japanese, a sentence consists of several APs (Accentual phrases) and MPs (Major phrases), and the breaks between these phrases must predicted to realize text-to-speech systems. An MP also consists of several APs and plays a decisive role in making synthetic speech natural and understandable because short pauses appear at its boundary. The variable break is defined as a break that is able to change easily from an AP to an MP boundary, or from an MP to an AP boundary. Using CART (Classification and Regression Trees), the variable break is modeled stochastically, and then we pre-select candidate units in the unit-selection process. As the experimental results show, it was possible to complement a break prediction error and improve the naturalness of synthetic speech.
PDF

Arabic-Numerals to Korean Transliteration Disambiguation using BERT (BERT를 이용한 숫자-한국어 음역 모호성 해소)

Park, Jeong Yeon;Yuk, Dae Bum;Lee, Jae Sung
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.42-44
- /
- 2020
TTS(Text-to-Speech) 시스템을 위해서는 한글 이외의 문자열을 한글로 변환해줄 필요가 있다. 이러한 문자열에는 숫자, 특수문자 등의 문자열이 포함되어 있다. 특히 숫자의 경우, 숫자가 사용되는 문맥에 따라 그 발음방법이 달라지는 문제점이 있다. 본 논문에서는 기존의 규칙기반과 한정된 문맥 정보만을 활용할 수 있는 방법이 아닌, 딥러닝을 이용한 방법으로 문맥에 따라 발음방법이 달라지는 숫자 음역의 모호성을 해소하는 방법을 소개한다.
PDF

The Interactive Voice Services based on VoiceXML (VoiceXML 기반 음성인식시스템을 이용한 서비스 개발)

Kim Hak-Gyoon;Kim Eun-Hyang;Kim Jae-In;Koo Myoung-Wan
- MALSORI
- /
- no.43
- /
- pp.113-125
- /
- 2002
As there are needs to search the Web information via wire or wireless telephones, VoiceXML forum was established to develop and promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML simplifies the creation of personalized interactive voice response services on the Web, and allows voice and phone access to information on Web sites, call center databases. Also, it can utilize the Web-based technologies, such as CGI(Common Gateway Interface) scripts. In this paper, we have developed the voice portal service platform based on VoiceXML called TeleGateway. It enables integration of voice services with data services using the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines. Also, we have showed the various services on voice portal services.
PDF

An emotional speech synthesis markup language processor for multi-speaker and emotional text-to-speech applications (다음색 감정 음성합성 응용을 위한 감정 SSML 처리기)

Ryu, Se-Hui;Cho, Hee;Lee, Ju-Hyun;Hong, Ki-Hyung
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.523-529
- /
- 2021
In this paper, we designed and developed an Emotional Speech Synthesis Markup Language (SSML) processor. Multi-speaker emotional speech synthesis technology that can express multiple voice colors and emotional expressions have been developed, and we designed Emotional SSML by extending SSML for multiple voice colors and emotional expressions. The Emotional SSML processor has a graphic user interface and consists of following four components. First, a multi-speaker emotional text editor that can easily mark specific voice colors and emotions on desired positions. Second, an Emotional SSML document generator that creates an Emotional SSML document automatically from the result of the multi-speaker emotional text editor. Third, an Emotional SSML parser that parses the Emotional SSML document. Last, a sequencer to control a multi-speaker and emotional Text-to-Speech (TTS) engine based on the result of the Emotional SSML parser. Based on SSML which is a programming language and platform independent open standard, the Emotional SSML processor can easily integrate with various speech synthesis engines and facilitates the development of multi-speaker emotional text-to-speech applications.
https://doi.org/10.7776/ASK.2021.40.5.523 인용 PDF KSCI

Descriptive Video Service using Text to Speech (TTS를 이용한 화면해설 방송 제작 방법)

Lim, Wootaek;Yang, Seung-Jun;Ahn, ChungHyun
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2013.06a
- /
- pp.282-283
- /
- 2013
본 논문에서는 기존의 화면해설 방송 제작 방법을 보완하기 위한 TTS 를 이용한 화면해설 방송 제작 방법을 제안한다. 우선 화면해설 방송이 삽입 될 수 있는 구간을 검출하기 위해 에너지 값과 스펙트럼 도심 값을 이용하여 묵음구간을 검출하고 검출된 구간에 TTS 를 이용하여 화면 해설을 삽입하였다. 제안한 방법을 이용하면 기존의 화면해설 방송 제작에 소요되는 인적, 시간적 노력을 줄일 수 있을 뿐만 아니라 화면해설 방송 콘텐츠의 양적 증가를 통해 시각 장애인들의 방송 접근성을 향상시키는 효과를 가져올 수 있다.
PDF

Text-to-Speech System Using Logatom (Logatom을 사용한 문서음성변환 시스템)

Cho Kwansun;Lee Chulhee
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.7-10
- /
- 1999
본 논문에서는 logatom 기반 무제한 한국어 TTS 시스템 구현을 제안한다. 이를 위하여 한국어를 대표할 만한 문서코퍼스를 선택하여 분석하고 이를 바탕으로 합성에 필요한 logatom을 설계한다. 일반적으로 음성코퍼스를 통해 음성세그먼트를 추출하여 접속에 기반한 TTS 시스템에서는 음성세그먼트를 의미있는 단어 또 는 어절로부터 추출한다. 하지만 음성세그먼트 추출시 고려되는 사항은 합성단위에 기초한 음소간의 결합형태이므로 본 논문에서는 음성세그먼트 추출을 위하여 무의미한 음소열인 logatom을 설계한다. Logatom은 문장 세그먼트의 어절내 위치와 문서코퍼스 분석 결과 얻어진 음소간의 결합형태를 기반으로 설계된다. 제안된 시스템의 합성음질을 평가하기 위하여 CVC 기반 logatom을 사용하여 임의의 문장을 합성해 본 결과 대부분의 음성세그먼트 접속이 자음에서 이루어지고 어절의 위치를 고려한 logatom 설계로 인하여 어절 내에서는 비교적 자연스러운 합성음을 얻을 수 있었다.
PDF

The development an E-Book and News web using TTS (TTS를 이용한 E-Book 및 News 웹 개발)

Jang, Eun-Gyeom;Kim, Ye-Eun;Seo, Dong-Jun
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2022.01a
- /
- pp.283-284
- /
- 2022
본 논문은 TTS를 사용해 사용자들에게 E-Book 및 뉴스를 보고 들을 수 있는 기능을 제공한다. 사용자 및 개발자가 직접 녹음한 TTS를 사용해 원하는 목소리, 배속과 같은 기능을 제공한다. 기존 TTS를 사용한 E-Book 사이트들은 광고가 많아 가독성의 문제와 유료 서비스인 반면에 본 논문에서 제안한 웹은 다양한 연령층의 사용자들이 사용하기 쉽게 메뉴의 간편화를 통해 다양한 E-Book 및 뉴스 기능을 제공함으로써 보다 직관적이고 쉽게 전자문서를 읽을 수 있도록 하였다.
PDF

Currency Recognition System for Blind People (시각장애인을 위한 화폐 인식 시스템)

Dong-Jun Yoo;Sung-Jun Kim;Jun-Yeong Lee;Hyeon-Su Kang;Jun-Ho Son;Se-Jin Oh
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2024.01a
- /
- pp.257-258
- /
- 2024
현재 시각장애인들이 현금을 사용하게 될 시 지폐가 얼마인지 확인할 방법이 없어 불편을 겪거나 금전적 사기를 당할 위험이 잦다. 한국은행에서는 이러한 사고를 막기 위해 점자 지폐를 만들어 발부하고 있지만 시각장애인 91%가 식별하지 못해 많은 불편을 겪고 있다. 본 논문에서는 딥러닝을 활용하여 화폐를 인식하고 TTS 기술을 사용하여 지폐의 값이 얼마인지 소리로 알려주는 시스템을 개발하였다. 지폐 인식을 위해 데이터를 직접 수집하여 YOLOv5 알고리즘을 활용하여 학습시킨 Weights 파일을 사용하였다. 이를 활용하여 시각장애인들은 더 안전하게 현금을 사용하고, 금전적인 문제를 예방할 수 있다.
PDF

Search Result 140, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)