• Title/Summary/Keyword: text-to-speech (TTS)

Search Result 140, Processing Time 0.026 seconds

APPLICATION OF KOREAN TEXT-TO-SPEECH FOR X.400 MHS SYSTEM

  • Kim, Hee-Dong;Koo, Jun-Mo;Choi, Ho-Joon;Kim, Sang-Taek
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.885-892
    • /
    • 1994
  • This paper presents the Korean text-to-speech (TTS) algorithm with speed and intonation control capability, and describes the development of the Voice message delivery system employing this TTS algorithm. This system allows the Interpersonal Messaging (IPM) Service users of Message Handling System (MHS) to send his/her text messages to user via telephone line using synthetic voice. In the X.400 MHS recommendation, the protocols and service elements are not specified for the voice message delivery system. Thus, we defined access protocol and service elements for Voice Access Unit based on the application program interface for message transfers between X.400 Message Transfer Agent and Voice Access Unit. The system architecture and operations will be provided.

  • PDF

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

Text Transliteration System and Number Transliteration Disambiguation for TTS (음성합성을 위한 텍스트 음역 시스템과 숫자 음역 모호성 처리)

  • Park, Jeong Yeon;Shin, Hyeong Jin;Yuk, Dae Bum;Lee, Jae Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.449-452
    • /
    • 2018
  • TTS(Text-to-Speech)는 문자열을 입력받아 그 문자열을 음성으로 변환하는 음성합성 기술이다. 그러나 실제 입력되는 문장에는 한글뿐만 아니라 영단어 및 숫자 등이 혼합되어 있다. 영단어는 대소문자에 따라 다르게 읽을 수 있으며, 단위로 사용될 때는 약어로 사용되는 것이므로, 알파벳 단위로 읽어서는 안 된다. 숫자 또한 함께 사용되는 단어에 따라 읽는 방식이 달라진다. 본 논문에서는 한글과 숫자 및 단위, 영단어가 혼합된 문장을 분류하고 이를 음역하는 시스템을 구성하며 word vector를 이용한 숫자 및 단위의 모호성 해소방법을 소개한다.

  • PDF

Development of Text-to-Speech System for PC (PC용 Text-to-Speech 시스템 개발)

  • Choi Muyeol;Hwang Cholgyu;Kim Soontae;Kim Junggon;Yi Sopae;Jang Seokbok;Pyo Kyungnan;Ahn Hyesun;Kim Hyung Soon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.41-44
    • /
    • 1999
  • 본 논문에서는 PC 응용을 위한 고음질의 한국어 text-to-speech(TTS) 합성 시스템을 개발하였다. 개발된 시스템의 합성방식으로는 음의 고저 조절, 인접음 사이의 연결 처리 및 음색제어 등에서 기존의 PSOLA 방식에 비해 장점을 가지는 정현파 모델 기반의 방식을 채택하였고, 자연스러운 운율 모델링을 위하여 통계적 기법중의 하나인 Classification and regression tree(CART) 방법을 사용하였다. 또한 음소 경계의 불연속성 문제를 줄이기 위한 합성단위로 초성-중성 및 종성 단위를 사용하였고, 다양한 음색표현이 가능하도록 음색제어 기능을 갖추었다. 그리고, 표준 Speech Application Program Interface(SAPI)를 준용한 TTS engine 형태로 구현함으로써 PC 상에서의 응용 프로그램 개발 편의성을 높였다. 합성음의 청취평가 결과 음질의 우수성 및 음색제어 기능의 유효성을 확인할 수 있었다.

  • PDF

Implementation of Information Access Embedded System for the Blind People (시각 장애인을 위한 정보접근 임베디드 시스템의 구현)

  • Kim, Si-Woo;Lee, Jae-Kyun;Lee, Chae-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.2C
    • /
    • pp.167-172
    • /
    • 2008
  • Since a 2-dimensional (2D) bar code can retrieve data and information quickly, it is widely used and recognized as a useful tool for many industrial applications. However, the information capacity of the 2D bar code is still limited. Recently the analog-digital code (AD code), which has the largest storage capacity yet contained in a code, has been developed, thereby expanding the bar code's application range because it overcomes the limitation of data capacity. In this paper, we present the AD code and implement an effective embedded system which can transform text information into voice using the 2D AD code and Text To Speech (TTS). This voice information can also be transmitted to blind people as well as the old by capturing the AD code on paper or in books.

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

Syllable-Level Smoothing of Model Parameters for HMM-Based Mixed-Lingual Text-to-Speech (HMM 기반 혼용 언어 음성합성을 위한 모델 파라메터의 음절 경계에서의 평활화 기법)

  • Yang, Jong-Yeol;Kim, Hong-Kook
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.87-95
    • /
    • 2010
  • In this paper, we address issues associated with mixed-lingual text-to-speech based on context-dependent HMMs, where there are multiple sets of HMMs corresponding to each individual language. In particular, we propose smoothing techniques of synthesis parameters at the boundaries between different languages to obtain more natural quality of speech. In other words, mel-frequency cepstral coefficients (MFCCs) at the language boundaries are smoothed by applying several linear and nonlinear approximation techniques. It is shown from an informal listening test that synthesized speech smoothed by a modified version of linear least square approximation (MLLSA) and a quadratic interpolation (QI) method is preferred than that without using any smoothing technique.

  • PDF

Development of Speech Recognition and Synthetic Application for the Hearing Impairment (청각장애인을 위한 음성 인식 및 합성 애플리케이션 개발)

  • Lee, Won-Ju;Kim, Woo-Lin;Ham, Hye-Won;Yun, Sang-Un
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.129-130
    • /
    • 2020
  • 본 논문에서는 청각장애인의 의사소통을 위한 안드로이드 애플리케이션 시스템 구현 결과를 보인다. 구글 클라우드 플랫폼(Google Cloud Platform)의 STT(Speech to Text) API를 이용하여 음성 인식을 통해 대화의 내용을 텍스트의 형태로 출력한다. 그리고 TTS(Text to Speech)를 이용한 음성 합성을 통해 텍스트를 음성으로 출력한다. 또한, 포그라운드 서비스(Service)에서 가속도계 센서(Accelerometer Sensor)를 이용하여 스마트폰을 2~3회 흔들었을 때 해당 애플리케이션을 실행할 수 있도록 하여 애플리케이션의 활용성을 높인 시스템을 개발하였다.

  • PDF

Implementation of Korean TTS Service on Android OS (안드로이드 OS 기반 한국어 TTS 서비스의 설계 및 구현)

  • Kim, Tae-Guon;Kim, Bong-Wan;Choi, Dae-Lim;Lee, Yong-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2012
  • Though Android-based smart phones are being released in Korea, Korean TTS engine is not built on them and Google has not announced service or software developer's kit related to Korean TTS officially. Thus, application developers who want to include Korean TTS capability in their application have difficulties. In this paper, we design and implement Android OS-based Korean TTS system and service. For speed, text preprocessing and synthesis libraries are implemented using Android NDK. By using Java's thread mechanism and the AudioTrack class, the response time of TTS is minimized. For the test of implemented service, an application that reads incoming SMS is developed. The test shows that synthesized speech are generated in real-time for random sentences. By using the implemented Korean TTS service, Android application developers can transmit information easily through voice. Korean TTS service proposed and implemented in this paper overcomes shortcomings of the existing restrictive synthesis methods and provides the benefit for application developers and users.

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

  • Miran Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.13-20
    • /
    • 2023
  • This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.