• 제목/요약/키워드: conversational-style

검색결과 16건 처리시간 0.02초

대화체 연속음성 인식을 위한 언어모델 적응 (Language Model Adaptation for Conversational Speech Recognition)

  • 박영희;정민화
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.83-86
    • /
    • 2003
  • This paper presents our style-based language model adaptation for Korean conversational speech recognition. Korean conversational speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpora. For style-based language model adaptation, we report two approaches. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf*idf similarity. In addition to relevance weighting, we use disfluencies as predictor to the neighboring words. The best result reduces 6.5% word error rate absolutely and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor.

  • PDF

한국어 대화체 TTS 개발을 위한 발음 및 운율 추정 (Grapheme-to-Phoneme Conversion and Prosody Modeling for Korean Conversational Style TTS)

  • 이진식;김승원;김병창;이근배
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 추계학술대회 발표논문집
    • /
    • pp.135-138
    • /
    • 2006
  • In this paper, we introduce a method for extracting grapheme-to-phoneme conversion rules from the transcription of speech synthesis database and a prosody modeling method using the light version of ToBI for a Korean conversational style TTS. We focused on representing the characteristics of the conversational speech style and the experimental results show that our proposed methods are suitable for developing a Korean conversional style TTS.

  • PDF

Modality-Based Sentence-Final Intonation Prediction for Korean Conversational-Style Text-to-Speech Systems

  • Oh, Seung-Shin;Kim, Sang-Hun
    • ETRI Journal
    • /
    • 제28권6호
    • /
    • pp.807-810
    • /
    • 2006
  • This letter presents a prediction model for sentence-final intonations for Korean conversational-style text-to-speech systems in which we introduce the linguistic feature of 'modality' as a new parameter. Based on their function and meaning, we classify tonal forms in speech data into tone types meaningful for speech synthesis and use the result of this classification to build our prediction model using a tree structured classification algorithm. In order to show that modality is more effective for the prediction model than features such as sentence type or speech act, an experiment is performed on a test set of 970 utterances with a training set of 3,883 utterances. The results show that modality makes a higher contribution to the determination of sentence-final intonation than sentence type or speech act, and that prediction accuracy improves up to 25% when the feature of modality is introduced.

  • PDF

Style-Specific Language Model Adaptation using TF*IDF Similarity for Korean Conversational Speech Recognition

  • Park, Young-Hee;Chung, Min-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • 제23권2E호
    • /
    • pp.51-55
    • /
    • 2004
  • In this paper, we propose a style-specific language model adaptation scheme using n-gram based tf*idf similarity for Korean spontaneous speech recognition. Korean spontaneous speech shows especially different style-specific characteristics such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words. To reflect these style-specific characteristics and overcome insufficient data for training language model, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to their n-. gram based tf*idf similarity, in which in-domain language model include disfluency model. Recognition results show that n-gram based tf*idf similarity weighting effectively reflects style difference.

음성 에이전트에서의 쇼핑 경험에 대한 사용자 경험 연구: 화면 유무와 제품관여도, 대화방식의 차이를 중심으로 (A Study on the UX of Shopping Experience in Conversational Agents: Focus on the Difference between the Presence of a Screen, Product Involvement, and Conversation Style)

  • 이화영;김동환
    • 한국멀티미디어학회논문지
    • /
    • 제25권8호
    • /
    • pp.1156-1166
    • /
    • 2022
  • In this study, we examined voice shopping interaction in which consumers can be involved in the decision-making process. Sixteen kinds of voice shopping interaction were designed with differences in the existence of screen/product involvement/conversation style. Their effects on trust, cognitive load, satisfaction, and continuous intention to use were evaluated through a survey experiment. The main effect of conversation style was significant, and it was found that the more deeply involved users have higher trust. The interaction effect between conversation style and product involvement was also significant. Low involvement product buyers had the most positive user experience from the conversation style that included 'Ask for preference,' while high involvement product buyers had the most positive user experience from the conversation style that included both 'Ask for preference' and 'Question and Answer.' The main effect and interaction effect of the existence of screen was not significant. The results indicate that a positive user experience can be obtained when users are deeply involved in consumer decision-making, especially in purchasing high-involvement products.

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • 말소리와 음성과학
    • /
    • 제13권3호
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

Intelligibility Improvement Benefit of Clear Speech and Korean Stops

  • Kang, Kyoung-Ho
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.3-11
    • /
    • 2010
  • The present study confirmed the intelligibility improvement benefit of clear speech by investigating the intelligibility of Korean stops produced in different speaking styles: conversational, citation-form, and clear speech. This finding supports the Hypo- & Hyper-speech theory that speakers adjust vocal effort to accommodate hearers' speech perception difficulty. A progressive intelligibility improvement was found for the three speaking styles investigated: clear speech was more intelligible than citation-form speech citation-form speech was more intelligible than conversational speech and clear speech was also more intelligible than conversational speech. These findings suggest that the manipulations to elicit three distinct speaking styles in a laboratory setting were successful. Korean lenis stops showed the least intelligibility improvement among the three Korean stop types, and this result suggests that lenis stops should be more resistant to intelligibility enhancement efforts in clear speech than aspirated and fortis stops.

  • PDF

한국어 모바일 대화형 에이전트 시스템 (A Korean Mobile Conversational Agent System)

  • 홍금원;이연수;김민정;이승욱;이주영;임해창
    • 한국컴퓨터정보학회논문지
    • /
    • 제13권6호
    • /
    • pp.263-271
    • /
    • 2008
  • 본 논문에서는 한국어 정보처리 기술을 사용한 모바일 환경의 대화형 에이전트 시스템에 대해 논한다. 대화형 에이전트 시스템 구축의 목적은 인간 사용자와 시스템 에이전트간의 자연어 인터페이스를 제공하여 보다 편리한 상호작용을 가능하게 하는 데 있다. 모바일 환경의 대화형 에이전트를 구축하기 위해서는 구어체 발화에 특화된 다양한 언어 처리 및 언어 이해 요소들이 필요하다. 본 시스템은 입력 문장의 오류처리, 형태소 분석 및 품사 태깅, 양태 분석, 논항 인식 및 의미프레임 생성, 그리고 유사 발화 검색 및 응답 생성으로 구성된다. 주어진 사용자 발화에 적절한 응답을 생성하기 위해서 본 시스템은 사용자 발화와 예제 발화 간의 어휘적, 통사/구문적, 의미적 유사도 정보를 활용하여 예제기반 응답 검색을 수행한다.

  • PDF

온라인 대화 행위에서 XML 기반 메시지를 이용한 미디어 지원 (Supporting Media using XML-based Messages on Online Conversational Activity)

  • 김경덕
    • 정보처리학회논문지B
    • /
    • 제11B권1호
    • /
    • pp.91-98
    • /
    • 2004
  • 본 논문에서는 온라인 대화 행위에서 다양한 미디어를 지원하기 위하여 XML(eXtensible Markup Language)을 이용하는 방법을 제안한다. 제안한 방법은 온라인 대화 행위에서 미디어 정보를 XML 기반 메시지로 변환하여 기존 텍스트 기반 메시지와 유사하게 처리한다. 이때 XML 기반 메시지와 미디어는 서버에 저장되며, XML 기반 메시지는 하나의 XML 문서로 통합된 후 XSLT 문서를 적용하여 HTML 문서를 생성한다. 각 클라이언트의 대화자는 HTML 문서의 하이퍼링크를 이용하여 미디어를 재생 및 프레젠테이션 한다. 제안한 방법은 온라인 대화 행위에서 텍스트, 이미지, 오디오, 비디오 등의 다양한 미디어의 사용을 효율적으로 지원하고, 또한 XML 태그의 확장 및 변경에 따라 텍스트 기반 메시지의 글자 크기, 색깔, 스타일 등의 유지보수에 효율적이다. 적용 예로서, 온라인 대화 행위에서 미디어를 지원하기 위하여 클라이언트-서버구조를 가지는 시스템을 구현하였으며, 각 대화자는 웹 브라우저에서 JAVA 애플릿과 서블릿을 사용하여 텍스트 및 미디어 기반 메시지를 입력하며, 대화자가 메시지를 입력할 때마다 대화 메시지가 자동으로 갱신되도록 하였다. 대화자는 사용자 인터페이스상의 대화 메시지에서 하이퍼링크의 클릭으로 미디어를 재생 및 프레젠테이션 한다. 제안한 방법의 응용 분야로는 원격 교육, 게임, 협업 등이다.

Over the Rainbow: How to Fly over with ChatGPT in Tourism

  • Taekyung Kim
    • Journal of Smart Tourism
    • /
    • 제3권1호
    • /
    • pp.41-47
    • /
    • 2023
  • Tourism and hospitality have encountered significant changes in recent years as a result of the rapid development of information technology (IT). Customers now expect more expedient services and customized travel experiences, which has intensified competition among service providers. To meet these demands, businesses have adopted sophisticated IT applications such as ChatGPT, which enables real-time interaction with consumers and provides recommendations based on their preferences. This paper focuses on the AI support-prompt middleware system, which functions as a mediator between generative AI and human users, and discusses two operational rules associated with it. The first rule is the Information Processing Rule, which requires the middleware system to determine appropriate responses based on the context of the conversation using techniques for natural language processing. The second rule is the Information Presentation Rule, which requires the middleware system to choose an appropriate language style and conversational attitude based on the gravity of the topic or the conversational context. These rules are essential for guaranteeing that the middleware system can fathom user intent and respond appropriately in various conversational contexts. This study contributes to the planning and analysis of service design by deriving design rules for middleware systems to incorporate artificial intelligence into tourism services. By comprehending the operation of AI support-prompt middleware systems, service providers can design more effective and efficient AI-driven tourism services, thereby improving the customer experience and obtaining a market advantage.