• Title/Summary/Keyword: 시청각음성인식

Search Result 8, Processing Time 0.025 seconds

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

  • Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.233-243
    • /
    • 2016
  • Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.

Audio-Based Human-Robot Interaction Technology (오디오 기반 인간로봇 상호작용 기술)

  • Kwak, K.C.;Kim, H.J.;Bae, K.S.;Yoon, H.S.
    • Electronics and Telecommunications Trends
    • /
    • v.22 no.2 s.104
    • /
    • pp.31-37
    • /
    • 2007
  • 인간로봇 상호작용 기술(human-robot interaction)은 다양한 의사소통 채널인 로봇카메라, 마이크로폰, 기타 센서를 통해 인지 및 정서적으로 상호작용할 수 있도록 로봇시스템 및 상호작용 환경을 디자인하고 구현 및 평가하는 지능형 서비스 로봇의 핵심기술이다. 본 고에서는 오디오 기반 인간로봇 상호작용 기술 중에서 음원 추적(sound localization)과 화자인식(speaker recognition) 기술의 국내외 기술동향을 살펴보고 최근 ETRI 지능형로봇연구단에서 상용화를 추진중인 시청각 기반 음원 추적(audio visual sound localization)과 문장독립 화자인식(text-independent speaker recognition)기술들을 다룬다. 또한 이들 기술들을 가정환경에서 효과적으로 사용하기 위해 음성인식, 얼굴검출, 얼굴인식 등을 결합한 시나리오에 대해서 살펴본다.

A Full Body Gumdo Game with an Intelligent Cyber Fencer using Multi-modal(3D Vision and Speech) Interface (멀티모달 인터페이스(3차원 시각과 음성 )를 이용한 지능적 가상검객과의 전신 검도게임)

  • 윤정원;김세환;류제하;우운택
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.4
    • /
    • pp.420-430
    • /
    • 2003
  • This paper presents an immersive multimodal Gumdo simulation game that allows a user to experience the whole body interaction with an intelligent cyber fencer. The proposed system consists of three modules: (i) a nondistracting multimodal interface with 3D vision and speech (ii) an intelligent cyber fencer and (iii) an immersive feedback by a big screen and sound. First, the multimodal Interface with 3D vision and speech allows a user to move around and to shout without distracting the user. Second, an intelligent cyber fencer provides the user with intelligent interactions by perception and reaction modules that are created by the analysis of real Gumdo game. Finally, an immersive audio-visual feedback by a big screen and sound effects helps a user experience an immersive interaction. The proposed system thus provides the user with an immersive Gumdo experience with the whole body movement. The suggested system can be applied to various applications such as education, exercise, art performance, etc.

Effects of Situation Awareness and Decision Making on Safety, Workload and Trust in Autonomous Vehicle Take-over Situations (자율주행 자동차의 제어권 전환상황에서 상황인식 및 의사결정 정보 제공이 운전자에게 미치는 영향)

  • Kim, Jihyun;Lee, Kahyun;Byun, Youngsi
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.21-29
    • /
    • 2019
  • Take-over requests in semi-autonomous cars must be handled properly in the case of road obstacles or curved roads in order to avoid accidents. In these situations, situation awareness and appropriate decision making are essential for distracted drivers. This study used a driving simulator to investigate the components of auditory-visual information systems that affect safety, workload, and trust. Auditory information consisted of either voice guidance providing situation awareness for the take-over or a beep sound that only alerted the driver. Visual information consisted of either a screen showing how to maneuver the vehicle or only an icon indicating a take-over situation. By providing auditory information that increased situation awareness and visual information that aided decision making, trust and safety increased, while workload decreased. These results suggest that the levels of situation awareness and decision making ability affect trust, safety, and workload for drivers.

An Implementation of Ontology, Vision and Voice Using Self-Care System in Mobile Environment (모바일 환경에서 온톨로지 및 시청각 정보를 이용한 셀프케어 시스템 구현)

  • Kwon, Hyeong-Oh;Piao, Shi-Quan;Son, Jong-Wuk;Cho, Hui-Sup
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06d
    • /
    • pp.55-57
    • /
    • 2012
  • 최근의 정보통신 기술의 발달은 의료의 중심을 기존의 진단 및 치료에서 예방으로 이동하게 하였다. 이에 따라 환자 뿐 아니라 건강에 관심이 많은 일반인들이 병원 밖에서도 스스로 건강 증진을 위해 질병 예방을 수행할 수 있도록 지원하는 셀프케어 시스템들이 개발되어지고 있다. 그러나 현재 개발되어지고 있는 셀프케어 시스템들은 대부분 회원 등록의 과정이 필수적이며, 또한 회원 등록 과정에서 자가진단을 위해 성별과 같은 기본적인 정보를 수동적으로 입력해야 하는 번거로움이 있다. 따라서 본 논문에서는 사용자의 음성과 영상을 이용한 성별 인식을 통해 비 등록 사용자에 대한 기본정보를 자동으로 인식하고 온톨로지를 이용한 자가진단 서비스에 적용함으로써 회원 등록을 원치 않는 비 등록 사용자에 대한 편의성을 고려하였다. 또한 온톨로지를 이용한 자가진단 서비스의 진단 결과로 질환에 대한 정의 및 증상, 치료 방법을 설명하고 증상의 경중에 따라 방문 치료할 진료과에 대한 정보를 제공함으로써 사용자가 스스로 건강을 진단하고 진단결과에 따라 간단한 치료를 할 수 있으며, 증상의 악화를 예방할 수 있도록 도와준다.

Classification standard of Communication Tool (플랫폼 분류 기준 고찰 : 감각의 입·출력)

  • Kim, Hyo-Yeun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.189-190
    • /
    • 2018
  • Digital content requires the concept and structure that give us insights into the languages between computers and humans and how humans experience manifested among the flow of characters, images, and voice. Communicology, $Vil{\acute{e}}m$ Flusser's original study, allows us to reconsider and to reconstruct the boundary of human awareness. This paper intends to begin understanding digital content consisting of numerical codes by reviewing communicology. communicology helps to break up pre-existing categories and thinking about new standards. ith the help of information technology. Planning content can be actualized by classifying and reconstructing content that are input/output of senses. The standard of classification is 'boundary' and 'direction,' communication elements that cannot be broken down any further. There is no need to communicate if there is no boundary. The operation of communication is comprised of 'direction.' Considering humankind as the standard, the boundary that takes in stimulation from outside can be seen as senses. Direction can be expressed as input/output. Output assumes that technical pictures receive information. The coordinates for various pre-existing platforms and content and uncovered platforms can be set with a consistent standard. This allows us to escape from the standard of flat content that was activated by sight and rationality at the ideology of characters, to seek a three-dimensional standard that can be vitalized by various senses and irrationality, and to reconstruct the input/output of senses to show the possibility of planning a new platform.

  • PDF

Infants Manless Management System (영유아 무인 관리 시스템)

  • Min, YG;Gwon, GM;I, Eon Jo;Park, SJ;Chung, HC
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.413-416
    • /
    • 2017
  • The development of the internet, which is due to the advent of the fourth industrial revolution, has been slowly affecting our lives. Based on this trend, various products have recently emerged, and have yet to be developed in the context of identifying the dangers of infant babies. Increasingly, children are experiencing problems with detecting and responding to children's lives because of their daily living noise and housekeeping activities. This project attempts to develop a raspberry pie and an audiovisual sensor module to avoid the risk of preventing unwanted behavior from tripping the child's sudden behavior in everyday life. Further, it was designed to provide convenience to the guardian's convenience by implementing the smartphone app with the Wifi signal.

  • PDF