• Title/Summary/Keyword: Audio-visual integration

Search Result 28, Processing Time 0.022 seconds

Audio-Visual Integration based Multi-modal Speech Recognition System (오디오-비디오 정보 융합을 통한 멀티 모달 음성 인식 시스템)

  • Lee, Sahng-Woon;Lee, Yeon-Chul;Hong, Hun-Sop;Yun, Bo-Hyun;Han, Mun-Sung
    • Annual Conference of KIPS
    • /
    • 2002.11a
    • /
    • pp.707-710
    • /
    • 2002
  • 본 논문은 오디오와 비디오 정보의 융합을 통한 멀티 모달 음성 인식 시스템을 제안한다. 음성 특징 정보와 영상 정보 특징의 융합을 통하여 잡음이 많은 환경에서 효율적으로 사람의 음성을 인식하는 시스템을 제안한다. 음성 특징 정보는 멜 필터 캡스트럼 계수(Mel Frequency Cepstrum Coefficients: MFCC)를 사용하며, 영상 특징 정보는 주성분 분석을 통해 얻어진 특징 벡터를 사용한다. 또한, 영상 정보 자체의 인식률 향상을 위해 피부 색깔 모델과 얼굴의 형태 정보를 이용하여 얼굴 영역을 찾은 후 강력한 입술 영역 추출 방법을 통해 입술 영역을 검출한다. 음성-영상 융합은 변형된 시간 지연 신경 회로망을 사용하여 초기 융합을 통해 이루어진다. 실험을 통해 음성과 영상의 정보 융합이 음성 정보만을 사용한 것 보다 대략 5%-20%의 성능 향상을 보여주고 있다.

  • PDF

Crossmodal Perception of Mismatched Emotional Expressions by Embodied Agents (에이전트의 표정과 목소리 정서의 교차양상지각)

  • Cho, Yu-Suk;Suk, Ji-He;Han, Kwang-Hee
    • Science of Emotion and Sensibility
    • /
    • v.12 no.3
    • /
    • pp.267-278
    • /
    • 2009
  • Today an embodied agent generates a large amount of interest because of its vital role for human-human interactions and human-computer interactions in virtual world. A number of researchers have found that we can recognize and distinguish between various emotions expressed by an embodied agent. In addition many studies found that we respond to simulated emotions in a similar way to human emotion. This study investigates interpretation of mismatched emotions expressed by an embodied agent (e.g. a happy face with a sad voice); whether audio-visual channel integration occurs or one channel dominates when participants judge the emotion. The study employed a 4 (visual: happy, sad, warm, cold) $\times$ 4 (audio: happy, sad, warm, cold) within-subjects repeated measure design. The results suggest that people perceive emotions not depending on just one channel but depending on both channels. Additionally facial expression (happy face vs. sad face) makes a difference in influence of two channels; Audio channel has more influence in interpretation of emotions when facial expression is happy. People were able to feel other emotion which was not expressed by face or voice from mismatched emotional expressions, so there is a possibility that we may express various and delicate emotions with embodied agent by using only several kinds of emotions.

  • PDF

Audio Stress Effect on Visual ERP Stimulated by 3-dimensional Environment (청각 스트레스가 3차원 시자극 유발전위에 미치는 영향 분석)

  • 박찬희;홍철운;김남균
    • Journal of Biomedical Engineering Research
    • /
    • v.23 no.4
    • /
    • pp.301-308
    • /
    • 2002
  • This research was performed to analyze quantitatively how spiritual stress affects some ERPs on human through sight stimulus after the settlement of visual and auditory integration environment in three dimension space. We measured ERPs in the normal state and spiritual stress sessions separately. The subjects were 10 normal men and women and vital signs was recorded from Fpl, Fz, Cz, Pz, O1, O2's scalps. The experiment was done in isolated room where electro-magnetic effect do not affect. The result showed that P300's amplitude was a little higher under stress session and latent period in this resulted in longer time. We recorded through voltage variation the activity of brain which is in charge of human's perception. cognition, process of action and evaluated the effect of spiritual stress. We expected that the result of this research can be used to evaluate the malfunction of brain.

Implementation of SMIL Editor for Multimedia Broadcasting (멀티미디어 방송을 위한 SMIL 편집 시스템 구현)

  • 장대영;김창수;정회경
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.3
    • /
    • pp.622-629
    • /
    • 2004
  • Recently, as digital broadcasting and internet are spreaded out of the world, we can easily use informations with less restrictions of time and space. According to the current trends, concerns for the ways of representing multimedia data has been rapidly increased, and users demand the services with integrated document that takes not only simple text and image but also time varying audio-visual data. Therefore, in 1998, W3C presented an international standard, SMIL in order to solve multimedia object representation and synchronization problems. By using SMIL, various multimedia elements can be integrated as a multimedia document with proper view in a space and time. Using this SMIL document, we can create new internet radio broadcasting service that delivers not only audio data but also various text, image and video. In this paper, we describe on a SMIL document editor for the common users to be able to represent time varying multimedia data with special layout and synchronization of time and space.

Efficiency and Productivity on ICT Industry (ICT 제조업과 서비스업의 효율성과 생산성)

  • Jeong, Boon-Do
    • International Commerce and Information Review
    • /
    • v.16 no.4
    • /
    • pp.55-75
    • /
    • 2014
  • Non-parametric method such as technology efficiency, DEA/Window model and Malmquist Productivity Index (MPI) are used to measure efficiency and productivity of ICT (Information and Communication Technology) manufacturing industry and service industry over the period 2007-2011. The results of this paper indicate following: (1) Technology efficiency of the ICT manufacturing industry were found as the range of 0.34 and 0.39 over the sample period. Technology efficiency of the ICT service industry were found as the range of 0.16 and 0.20 over the sample period. (2) The geometric average of the Malmquist TFP indexes on ICT manufacturing industry indicated the productivity improvement an average of 8.3 percent. The geometric average of the Malmquist TFP indexes on ICT service industry indicated the productivity improvement an average of 1.6 percent. (3) TIER analysis result on ICT manufacturing industry showed that optimal bench marking made by storage devices${\rightarrow}$wireless communication equipment${\rightarrow}$broadcasting equipment${\rightarrow}$radio, recording and playback devices${\rightarrow}$computers, printers, video and audio-visual equipment path. TIER analysis result on ICT service industry indicated that optimal bench marking made by computers and packaged software${\rightarrow}$wired communication${\rightarrow}$communication, information, detection equipment${\rightarrow}$consulting and construction for computer systems integration${\rightarrow}$industrial machinery and equipment rental${\rightarrow}$telecommunications reseller${\rightarrow}$system software development and delivery${\rightarrow}$hosting path.

  • PDF

Global Advertising Creative Trend Based on 2019 Cannes Lion Winners (2019 칸 국제광고제(Cannes Lions)를 통해서 본 글로벌 광고 크리에이티브 트렌드)

  • Um, Namhyun
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.12
    • /
    • pp.121-128
    • /
    • 2019
  • This study, after analyzing 2019 Cannes Lion winners, suggests five creative trends. It was found that these global five creative trends are multi-sensory experience, integration between data and creativity, real humanism, brand activist, and consideration for socially disadvantaged. Not only visual impact or stimulus, but audio or scent could be utilized in an effort to stimulate consumers and promote brands. In 2019 Cannes Lions advertising campaigns which sought to find consumer insight through data and apply it to creativity received applause from Cannes Lions judges. In addition, advertising campaigns based on real humanism suggest that attitudes are best changed through positive experiences. Nike's 'Dream Crazy' campaign is a good example which shows a company's philosophy and brand purposes. Brands are becoming activists and are being encouraged to take real action, and for good reason. Lastly, projects such as Ikea's 'ThisAbles' and Microsoft's 'Xbox Adaptive Controller' are contrived for the socially disadvantaged.

Establishment and Application of an Integrated Platform for Navigation Safety Information (항행안전정보 통합 플랫폼 구축 및 활용방안에 관한 연구)

  • Kim, Do-Hoon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.129-138
    • /
    • 2020
  • This study recognizes problems in the current system of providing navigation safety information (NSI), which is centered on merchant ships, to prevent maritime accidents of fishing boats and small vessels in coastal areas. The system proposed in this study is an effective method of providing NSI to support the safe navigation of small vessels such as fishing boats. First, the status and characteristics of recent maritime accidents were examined, and NSI service targets were identified. Second, the limitations of the current NSI system were determined, and measures were proposed to establish an NSI Integrated Platform (NSIP) that ensures the integration, accessibility, and usability of NSI for a substantial portion of the public. Third, to utilize the NSIP, various NSIs are applied as additional information for the electronic chart system used in the e-navigation ship terminals being developed in connection with the Korean e-navigation project. Functions that set the audiovisual alarm function to automatically operate when a ship enters a navigation risk zone is proposed. These functions are technically achieved by reviewing expert opinions of related organizations and professional producers. The results of this study suggest that NSI can be applied to small vessels such as fishing boats, through the Korean e-Navigation project, to prevent maritime accidents caused by the human error of navigators.

Collaborative Authoring System using 3D Spatio-Temporal Space (삼차원 시.공간을 이용하는 프레젠테이션 공동저작 시스템)

  • 이도형;성미영
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.6
    • /
    • pp.623-634
    • /
    • 2003
  • In this paper, we propose a collaborative multimedia authoring system. Our authoring system represents a multimedia presentation in a 3D coordinate system. One axis represents the traditional timeline information (T-zone), and the other two axes represent spatial coordinates (XY-zone). Our system represents a visual media objects as a 3D parallelepipeds and audio media objects as cylinders. This interface allows for simultaneous authoring and manipulation of both the temporal and the spatial aspects of a presentation. Using our system, users can design multimedia presentations collaboratively in the unified spatio-temporal space while freely traversing the spatial domain and the temporal domain without changing the context of authoring. In addition, we suggest an efficient mechanism of concurrency control for shared objects generated by our collaborative writing system. The mechanism is mainly based on the user awareness, the multiple versions, and the access permission of shared objects. Our concurrency control mechanism is designed to keep data consistency by minimizing the collision due to the delay or the failure of network communication and to allow maximum responsiveness for users using optimistic concurrency control. Also, the mechanism maximize the responsiveness by refining the locking granularity and applying different concurrency control mechanisms to each.