• Title/Summary/Keyword: text-to-speech system

Search Result 246, Processing Time 0.024 seconds

Emergency dispatching based on automatic speech recognition (음성인식 기반 응급상황관제)

  • Lee, Kyuwhan;Chung, Jio;Shin, Daejin;Chung, Minhwa;Kang, Kyunghee;Jang, Yunhee;Jang, Kyungho
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.31-39
    • /
    • 2016
  • In emergency dispatching at 119 Command & Dispatch Center, some inconsistencies between the 'standard emergency aid system' and 'dispatch protocol,' which are both mandatory to follow, cause inefficiency in the dispatcher's performance. If an emergency dispatch system uses automatic speech recognition (ASR) to process the dispatcher's protocol speech during the case registration, it instantly extracts and provides the required information specified in the 'standard emergency aid system,' making the rescue command more efficient. For this purpose, we have developed a Korean large vocabulary continuous speech recognition system for 400,000 words to be used for the emergency dispatch system. The 400,000 words include vocabulary from news, SNS, blogs and emergency rescue domains. Acoustic model is constructed by using 1,300 hours of telephone call (8 kHz) speech, whereas language model is constructed by using 13 GB text corpus. From the transcribed corpus of 6,600 real telephone calls, call logs with emergency rescue command class and identified major symptom are extracted in connection with the rescue activity log and National Emergency Department Information System (NEDIS). ASR is applied to emergency dispatcher's repetition utterances about the patient information. Based on the Levenshtein distance between the ASR result and the template information, the emergency patient information is extracted. Experimental results show that 9.15% Word Error Rate of the speech recognition performance and 95.8% of emergency response detection performance are obtained for the emergency dispatch system.

A Study of Speech Control Tags Based on Semantic Information of a Text (텍스트의 의미 정보에 기반을 둔 음성컨트롤 태그에 관한 연구)

  • Chang, Moon-Soo;Chung, Kyeong-Chae;Kang, Sun-Mee
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.187-200
    • /
    • 2006
  • The speech synthesis technology is widely used and its application area is also being broadened to an automatic response service, a learning system for handicapped person, etc. However, the sound quality of the speech synthesizer has not yet reached to the satisfactory level of users. To make a synthesized speech, the existing synthesizer generates rhythms only by the interval information such as space and comma or by several punctuation marks such as a question mark and an exclamation mark so that it is not easy to generate natural rhythms of people even though it is based on mass speech database. To make up for the problem, there is a way to select rhythms after processing language from a higher level information. This paper proposes a method for generating tags for controling rhythms by analyzing the meaning of sentence with speech situation information. We use the Systemic Functional Grammar (SFG) [4] which analyzes the meaning of sentence with speech situation information considering the sentence prior to the given one, the situation of a conversation, the relationship among people in the conversation, etc. In this study, we generate Semantic Speech Control Tag (SSCT) by the result of SFG's meaning analysis and the voice wave analysis.

  • PDF

Traffic Signal Recognition System Based on Color and Time for Visually Impaired

  • P. Kamakshi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.48-54
    • /
    • 2023
  • Nowadays, a blind man finds it very difficult to cross the roads. They should be very vigilant with every step they take. To resolve this problem, Convolutional Neural Networks(CNN) is a best method to analyse the data and automate the model without intervention of human being. In this work, a traffic signal recognition system is designed using CNN for the visually impaired. To provide a safe walking environment, a voice message is given according to light state and timer state at that instance. The developed model consists of two phases, in the first phase the CNN model is trained to classify different images captured from traffic signals. Common Objects in Context (COCO) labelled dataset is used, which includes images of different classes like traffic lights, bicycles, cars etc. The traffic light object will be detected using this labelled dataset with help of object detection model. The CNN model detects the color of the traffic light and timer displayed on the traffic image. In the second phase, from the detected color of the light and timer value a text message is generated and sent to the text-to-speech conversion model to make voice guidance for the blind person. The developed traffic light recognition model recognizes traffic light color and countdown timer displayed on the signal for safe signal crossing. The countdown timer displayed on the signal was not considered in existing models which is very useful. The proposed model has given accurate results in different scenarios when compared to other models.

Low-cost implementation of text to speech(TTS) system for car navigation (Car Navigation용 음성합성시스템 최저가 구현)

  • Na Ji Hoon;Sung Jung Mo;Yang Yoon Gi
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.141-144
    • /
    • 2000
  • 최근에 무선통신망을 이용한 데이터 서비스가 폭넓게 제공되면서, 이동체(MS:mobile station)에 대한 위치정보나 교통상황 둥의 부가 정보 서비스가 제공되고 있다. 이와 같이 이동체가 자동차와 같은 운행수단일 때 사용자가 디스플레이 되는 문자정보를 확인하게 되면 운전의 안정성이 저하되어 실용적이지 못하다. 이를 위해서 문자를 음성으로 전환하여 주는 문자-음성변환기(text to speech : TTS)가 필요하다. 본 논문은 car navigation용 '한국어 무제한 어휘 음성합성기' 를 저가의 DSP chip(ADSP-2185)과 저용량의 4M bits ROM을 사용하여 low-cost system으로 하드웨어를 구성하였다. 본 연구에서 개발된 실시간 한국어 음성 합성기는 저가의 통신 단말기로서 사용 될 수 있으나, 반음절 연결부분의 연결이 불완전한 경우가 많았다. 그러나 종성이 없는 음절에 대해서는 명료도가 비교적 우수하였다.

  • PDF

The Design and Implementation of Korean Text-to-Speech Conversion System on a Rule-Based Framework (한국어(韓國語) 규칙(規則) 음성(音聲) 합성(合成) 시스템의 구현(具現))

  • Son, Yung-Taek;Kim, Yong-Kap;Matsumoto, Tatsuro
    • Annual Conference on Human and Language Technology
    • /
    • 1993.10a
    • /
    • pp.141-148
    • /
    • 1993
  • 본고는, 한글 한자가 혼용된 입력 텍스트를 음성으로 변환 출력하는 포르만트 음성 합성 방식 즉, 한국어 규칙 음성 합성(이하에는 KTTS[Korean Text To Speech System]이라고 함)의 전반적인 처리 흐름에 대하여 소개한다. 특히, 입력 텍스트에 있어서, 한자 또는 각종 부호의 한글 변환 기능, 음성 출력용 문법 정보 추출에 필요한 입력문의 해석 및 구문경계 설정 기능, 또한 음소 기호 변환 및 파라메터 값 생성과 변경 처리기능을 중심으로 설명하고자 한다. 또한 본 시스템의 완성과 더불어 실시하였던 청취 실험 평가 결과에 대하여 덧붙이겠다.

  • PDF

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

Hand-Gesture Dialing System for Safe Driving (안전성 확보를 위한 손동작 전화 다이얼링 시스템)

  • Jang, Won-Ang;Kim, Jun-Ho;Lee, Do Hoon;Kim, Min-Jung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.10
    • /
    • pp.4801-4806
    • /
    • 2012
  • There are still problems have to solve for safety of driving comparing to the upgraded convenience of advanced vehicle. Most traffic accident is by uncareful driving cause of interface operations which are directive reasons of it in controlling the complicate multimedia device. According to interesting in smart automobile, various approaches for safe driving have been studied. The current multimedia interface embedded in vehicle is lacking the safety due to loss the sense and operation capacity by instantaneous view movement. In this paper, we propose a safe dialing system for safe driving to control dial and search dictionary by hand-gesture. The proposed system improved the user convenience and safety in automobile operation using intuitive gesture and TTS(Text to Speech).

A Study on the Generation of Multi-syllable Nonsense Wordset for the Assessment of Synthetic Speech (합성음성평가를 위한 다음절 무의미단어 생성과 이용에 관한 연구)

  • Jo, Cheol-Woo;Kim, Kyung-Tae;Lee, Yong-Ju
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.5
    • /
    • pp.51-58
    • /
    • 1994
  • These times many kinds of man-machine Interfaces using speech signal, speech recognizers or speech synthesizers, are proposed and utilized in practice. Especially speech synthesis system is widely used in our life. But its assessment method is still in its first stage. In this paper we propose a method to generate multi-syllable nonsense wordset for the purpose of synthetic speech assessment and applies the wordset to one commercial text-to-speech system. Some results about the experiment is suggested and it is verified that the method to generate a nonsense wordset can be used to assess the intelligibility of the synthesizer in phoneme level or in phonemic environmental level.

  • PDF

A Development of Administrative Affairs Supporting System using Call Control Mode of CTI (CTI 호출 제어 방식을 이용한 행정 업무 지원 시스템의 개발)

  • 최준기;조성범;정상수;이상정
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.2
    • /
    • pp.46-60
    • /
    • 1999
  • Recently, CTI (Computer Telephony Integration) technology has been widely applied to various area such as video conference, file transfer, voice mail, automatic message transfer and automatic redial, integrated messaging and network fax. In this paper, an administrative affairs supporting system using call control mode of CTI is designed. To improve inefficient processing of job due to heavy calling from entrance candidates during entrance examination of a college, the system is developed. The database of the system is desigend using object modeling technique. Also, the automatic calling and response system using CTI call control mode is implemented. Especially, to interface with voice of candidates who ask whether they pass or fail the entrance examination of the college, TTS(Text To Speech) module is developed.

  • PDF

A Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis (코퍼스 기반 음성합성기를 위한 합성단위 경계 스펙트럼 평탄화 알고리즘)

  • Kim Sang-Jin;Jang Kyung Ae;Hahn Minsoo
    • MALSORI
    • /
    • no.56
    • /
    • pp.225-235
    • /
    • 2005
  • Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the fullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.

  • PDF