• Title/Summary/Keyword: Text-to-Speech

Search Result 501, Processing Time 0.026 seconds

Multimodal Approach for Summarizing and Indexing News Video

  • Kim, Jae-Gon;Chang, Hyun-Sung;Kim, Young-Tae;Kang, Kyeong-Ok;Kim, Mun-Churl;Kim, Jin-Woong;Kim, Hyung-Myung
    • ETRI Journal
    • /
    • v.24 no.1
    • /
    • pp.1-11
    • /
    • 2002
  • A video summary abstracts the gist from an entire video and also enables efficient access to the desired content. In this paper, we propose a novel method for summarizing news video based on multimodal analysis of the content. The proposed method exploits the closed caption data to locate semantically meaningful highlights in a news video and speech signals in an audio stream to align the closed caption data with the video in a time-line. Then, the detected highlights are described using MPEG-7 Summarization Description Scheme, which allows efficient browsing of the content through such functionalities as multi-level abstracts and navigation guidance. Multimodal search and retrieval are also within the proposed framework. By indexing synchronized closed caption data, the video clips are searchable by inputting a text query. Intensive experiments with prototypical systems are presented to demonstrate the validity and reliability of the proposed method in real applications.

  • PDF

Development of Half-Mirror Interface System and Its Application for Ubiquitous Environment (유비쿼터스 환경을 위한 하프미러형 인터페이스 시스템 개발과 응용)

  • Kwon Young-Joon;Kim Dae-Jin;Lee Sang-Wan;Bien Zeungnam
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.12
    • /
    • pp.1020-1026
    • /
    • 2005
  • In the era of ubiquitous computing, human-friendly man-machine interface is getting more attention due to its possibility to offer convenient services. For this, in this paper, we introduce a 'Half-Mirror Interface System (HMIS)' as a novel type of human-friendly man-machine interfaces. Basically, HMIS consists of half-mirror, USB-Webcam, microphone, 2ch-speaker, and high-speed processing unit. In our HMIS, two principal operation modes are selected by the existence of the user in front of it. The first one, 'mirror-mode', is activated when the user's face is detected via USB-Webcam. In this mode, HMIS provides three basic functions such as 1) make-up assistance by magnifying an interested facial component and TTS (Text-To-Speech) guide for appropriate make-up, 2) Daily weather information provider via WWW service, 3) Health monitoring/diagnosis service using Chinese medicine knowledge. The second one, 'display-mode' is designed to show decorative pictures, family photos, art paintings and so on. This mode is activated when the user's face is not detected for a time being. In display-mode, we also added a 'healing-window' function and 'healing-music player' function for user's psychological comfort and/or relaxation. All these functions are accessible by commercially available voice synthesis/recognition package.

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

  • Yeom, Ha-Neul;Hwang, Myunggwon;Hwang, Mi-Nyeong;Jung, Hanmin
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.3
    • /
    • pp.29-39
    • /
    • 2014
  • In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

Development of Walking Assist Smartphone Case for Blind People (시각장애인의 보행보조를 위한 스마트폰 케이스 구현)

  • Choi, Jin-Woo;Jeong, Gu-Min
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.3
    • /
    • pp.239-242
    • /
    • 2015
  • In this paper, we propose a walking assisting system for blind people using Android smartphone and Arduino board. In our proposed system, we use an Android smartphone case and an external ultrasonic sensor to detect the obstacles ahead. In this manner, blind people is able to aware unexpected objects by smartphone speakers or vibration functionality. In addition, the walking assisting system is also designed a notice system which will be triggered by built-in smartphone camera flash when blind people walk in some darkness place. The experimental results from real experiments on blind people have demonstrated the applicability of our walking assisting system, when it not only efficiently helps blind people avoid obstacles ahead but also possible traffic collisions in darkness condition.

User certification module development of Gallery-Auction for NFC-based 2 Factor mobile electronic payment (NFC 기반 2 Factor 모바일 전자결제를 위한 갤러리-옥션의 사용자인증 모듈 개발)

  • Jo, Won Oh;Cha, Yoon Seok;Oh, Soo Hee;Choi, Myeong Soo;Kim, Hyung Jong
    • Smart Media Journal
    • /
    • v.6 no.3
    • /
    • pp.29-40
    • /
    • 2017
  • Lately weight for smartphone mounted to function for NFC is increasing, rapidly. Because of this, NFC related technology is made by many companies. We developed Gallery-Auction for security enhancements and new services of NFC-based 2 factor electronic payment system. Enhanced security features development of user authentication module through fingerprint recognition to apply FIDO authentication technology and developed electronic contract voice service of Gallery-Auction using TTS(Text to Speech). Therefore we enhanced convenient and simple authentication method and security through NFC mobile electronic payment.

A Study on Development of Applications which Provides Step-by-step CPR Guidelines and Learning Materials for Non Health-related Person (비보건계열 일반인을 위한 단계별 CPR 가이드라인과 학습자료 제공 어플리케이션 개발 연구)

  • Kim, Jong-Min
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.649-651
    • /
    • 2021
  • In Korea, there are around 30,000 cardiac arrest patients annually. Gradually the number is increasing. Against this background, CPR education and publicity programs were expanded nationwide, but the rate of witness CPR by the general public was 4.4%, which is significantly lower than the 20%~70% rate in other countries. Therefore, in this paper, we analyzed the factors affecting the performance of CPR by witnesses who discovered cardiac arrest patients. Based on the results, an application planning and development study was conducted to provide users with correct cardiorespiratory response tips and step-by-step CPR guidelines to help users effectively assist in increasing the rate of CPR by general eyewitnesses.

  • PDF

A Study on Comparison of Later Commentaries about Kyeokguk theory of Jeokcheonsu (『적천수(滴天髓)』 격국론의 후대 평주 간 비교연구)

  • Yi, Bo-young;Kim, Ki-Seung
    • Industry Promotion Research
    • /
    • v.7 no.1
    • /
    • pp.81-87
    • /
    • 2022
  • This study used a method of comparing and analyzing various editions of Jeokcheonsu, and aims to confirm why different views have arisen on commentaries that differ according to the perspective of one original text, which interpretation is more valid among them. The biggest part of the misunderstanding of Myeongri theory in Jeokcheonsu is Kyeokguk theory. Jeokcheonsu does not set a high value on Kyeokguk, and it is highly regarded as the Myeongri classics that emphasizes Eokbuyongsin. However, as a result of classifying the original text by theory, we can see there are about 5 sentences that directly mention Eokbu theory, but 9 sentences that explain Kyeokguk theory and 15 sentences if we include the sentences that explain Jonggyeok and Hwagyeok. Even looking that metaphoric speech is mainly used, it is also clear that it's not a book written to be read by a beginner of Myeongri. This is Myeongri texts written to convey more profound logic and enlightenment to a person who has sufficient knowledge by having learned the principle of Myeongri. A single sentence of 'Jaegwaninsubunpyeonjeong Gyeomronsiksanggyeokgukjeong' would have been sufficient to explain the Kyeokguk theory, because it's written on the assumption of the reader's level. Among the later commentaries about the theory of Myeongri contained in Jeokcheosu, 4 persons'commentaries on the original text of 'Palkyeok', 'Gwansal', Sangkwan', 'Wolryeong', 'Saengsi', 'Cheongtak' related to Kyeokguk theory was compared and analyzed.

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

Preliminary Analysis of Language Styles between South and North Korean Broadcastings (남북한 방송언어의 차이에 대한 기초 분석)

  • Lee, Chang-H.;Kim, Kyung-Il;Park, Jong-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3311-3317
    • /
    • 2010
  • This study compared South and North Korean broadcasting languages to measure the language differences due to the long segregation. This study would provide fundamental database on the language uses between South and North Korea. The KLIWC analyzed the text that was selected from news clips of South and North Korean broadcasting agencies. The results showed that North Korean languages were significantly different from South in terms of affective, cognitive, and social words. In addition, North Korean broadcasting used more person pronoun and a part of speech than South Korean broadcasting. Psychological interpretations were provided based on the language differences.