• Title/Summary/Keyword: Audio and Video

Search Result 804, Processing Time 0.027 seconds

Robot Vision to Audio Description Based on Deep Learning for Effective Human-Robot Interaction (효과적인 인간-로봇 상호작용을 위한 딥러닝 기반 로봇 비전 자연어 설명문 생성 및 발화 기술)

  • Park, Dongkeon;Kang, Kyeong-Min;Bae, Jin-Woo;Han, Ji-Hyeong
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.1
    • /
    • pp.22-30
    • /
    • 2019
  • For effective human-robot interaction, robots need to understand the current situation context well, but also the robots need to transfer its understanding to the human participant in efficient way. The most convenient way to deliver robot's understanding to the human participant is that the robot expresses its understanding using voice and natural language. Recently, the artificial intelligence for video understanding and natural language process has been developed very rapidly especially based on deep learning. Thus, this paper proposes robot vision to audio description method using deep learning. The applied deep learning model is a pipeline of two deep learning models for generating natural language sentence from robot vision and generating voice from the generated natural language sentence. Also, we conduct the real robot experiment to show the effectiveness of our method in human-robot interaction.

Similar Movie Contents Retrieval Using Peak Features from Audio (오디오의 Peak 특징을 이용한 동일 영화 콘텐츠 검색)

  • Chung, Myoung-Bum;Sung, Bo-Kyung;Ko, Il-Ju
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.11
    • /
    • pp.1572-1580
    • /
    • 2009
  • Combing through entire video files for the purpose of recognizing and retrieving matching movies requires much time and memory space. Instead, most current similar movie-matching methods choose to analyze only a part of each movie's video-image information. Yet, these methods still share a critical problem of erroneously recognizing as being different matching videos that have been altered only in resolution or converted merely with a different codecs. This paper proposes an audio-information-based search algorithm by which similar movies can be identified. The proposed method prepares and searches through a database of movie's spectral peak information that remains relatively steady even with changes in the bit-rate, codecs, or sample-rate. The method showed a 92.1% search success rate, given a set of 1,000 video files whose audio-bit-rate had been altered or were purposefully written in a different codec.

  • PDF

Design and Implementation of Multimedia Retrieval a System (멀티미디어 검색 시스템의 설계 및 구현)

  • 노승민;황인준
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.494-506
    • /
    • 2003
  • Recently, explosive popularity of multimedia information has triggered the need for retrieving multimedia contents efficiently from the database including audio, video and images. In this paper, we propose an XML-based retrieval scheme and a data model that complement the weak aspects of annotation and conent based retrieval methods. The Property and hierarchy structure of image and video data are represented and manipulated based on the Multimedia Description Schema (MDS) that conforms to the MPEG-7 standard. For audio contents, pitch contours extracted from their acoustic features are converted into UDR string. Especially, to improve the retrieval performance, user's access pattern and frequency are utilized in the construction of an index. We have implemented a prototype system and evaluated its performance through various experiments.

An Experimental Delay Analysis Based on M/G/1-Vacation Queues for Local Audio/Video Streams

  • Kim, Doo-Hyun;Lee, Kyung-Hee;Kung, Sang-Hwan;Kim, Jin-Hyung
    • ETRI Journal
    • /
    • v.19 no.4
    • /
    • pp.344-362
    • /
    • 1997
  • The delay which is one of the quality of service parameters is considered to be a crucial factor for the effective usage of real-time audio and video streams in interactive multimedia collaborations. Among the various causes of the delay, we focus in this paper on the local delay concerned with the schemes which handle continuous inflow of encoded data from constant or variable bit-rate audio and video encoders. We introduce two kinds of implementation approaches, pull model and push model. While the pull model periodically pumps out the incoming data from the system buffer, the push model receives events from the device drivers. From our experiments based on Windows NT 3.51, it is shown that the push model outperforms the other for both constant and variable bit-rate streams in terms of the local delay, when the system suffers reasonable loads. We interpret this experimental data with M/G/1 multiple vacation queuing theories, and show that it is consistent with the queuing theoretic interpretations.

  • PDF

Development of AVN Software Using Vehicle Information for Hand Gesture (차량정보 분석과 제스처 인식을 위한 AVN 소프트웨어 구현)

  • Oh, Gyu-tae;Park, Inhye;Lee, Sang-yub;Ko, Jae-jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.4
    • /
    • pp.892-898
    • /
    • 2017
  • This paper describes the development of AVN(Audio Video Navigation) software for vehicle information analysis and gesture recognition. The module that examine the CAN(Controller Area Network) data of vehicle in the designed software analyzes the driving state. Using classified information, the AVN software converge vehicle information and hand gesture information. As the result, the derived data is used to match the service step and to perform the service. The designed AVN software was implemented in HW platform that common used in vehicles. And we confirmed the operation of vehicle analysing module and gesture recognition in a simulated environment that is similar with real world.

Development of the central control system using IP PBX convergence with broadcasting function (방송기능이 있는 IP PBX 융합 중앙 관제 시스템 개발)

  • Kim, Sam-Taek
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.1-6
    • /
    • 2021
  • Currently, virus infection such as Corona 19 has become commonplace, and interest in unmanned systems is increasing in the field for non-face-to-face ICT services. In this paper, the function and performance of remotely successfully controlling a store through video and audio using an IP PBX with a broadcasting function was verified through a test. And the fully unmanned system is not gaining credibility due to various technical problems, however the central control system is a very efficient and reliable system because the controller can directly control the customer while monitoring the access and the inside of the store through the video and audio. In the future, we plan to study a completely unmanned remote control system using A.I technology.

실시간 AV 전송을 위한 Audio/Video Bridging 기술

  • Wi, Jeong-Uk;Park, Yong-Seok;Park, Gyeong-Won;Song, Byeong-Cheol;Jeon, Won-Gi
    • Information and Communications Magazine
    • /
    • v.30 no.6
    • /
    • pp.69-76
    • /
    • 2013
  • 최근 홈 네트워크/정보가전 기술의 비약적인 발전과 고품질 멀티미디어 콘텐츠의 보급으로 인해 네트워크 기반의 멀티미디어 전송 시스템에 대한 요구가 증가하고 있다. 이러한 요구로 인해 고품질 오디오, 비디오 데이터를 이더넷 (Ethernet) 망을 이용하여 실시간 전송할 수 있는 Audio Video Bridging(AVB) 기술이 IEEE에서 표준화 되었다. AVB 기술은 네트워크를 통해 오디오 및 비디오 데이터뿐만 아니라, 각 장치들의 제어/관리를 위한 데이터도 동시에 전송할 수 있는 기술이다. 기술 개발 초기에는 네트워크 및 오디오 전문 업체를 중심으로 오디오 전송에 특화된 AoE(Audio over Ethernet) 기술 개발이 주를 이루고 있었으나, AVB 표준화가 완료된 2011년 이후부터 AVB가 적용된 제품이 개발되고 있다. 이에 본 고에서는 네트워크를 통해 멀티미디어 데이터를 전송할 때 필요한 핵심 요소기술과 개발 동향에 대해 살펴보고, IEEE 표준인 AVB 기술에 대해 알아본다.

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

A Design of real sound recommendation service based-on User's preference, emotion and circumstance (사용자 취향, 감성 및 상황인지 기반 음원 추천 서비스 구현)

  • Jung, Jong-Jin;Lim, Tae-Beom;Lee, Seok-Pil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.689-691
    • /
    • 2011
  • Due to the rapid development of Information and communication, the technology of multimedia presentation technology is evolving into the service that user can actively, realistically enjoy and play based on user's preference and taste not only for User's passive service. Especially, the industry related the realistic multimedia service that supports targeting Human emotion with the property of Human hearing is expected to be formed of the high value-added premium market. Audio technology is affected on human's emotion and the viewing environment around than video technology. Also the audio technology compared to video technology is a research part that appeals to human emotion and emphasize on psychological aspects. With this viewpoint, the development of intelligent and realistic audio technology needs highly specialty. In this study, "intelligent real-sound presentation technology" that support high quality and realistic audio and the "core technologies" that are composing of this will be introduced.

Implementation of the Broadcasting System for Digital Media Contents (디지털 미디어 콘텐츠 방송 시스템 구현)

  • Shin, Jae-Heung;Kim, Hong-Ryul;Lee, Sang-Cheal
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.10
    • /
    • pp.1883-1887
    • /
    • 2008
  • Most of digital media contents are composed with video and audio, picture and animation informations. Sometime, there is some deviation of information recognition quality for the video and audio information according to information receiver's characteristics or the understanding. But visual information using the text provide most clear and accurate ways for information recognition to human being. In this paper, we propose a new broadcasting system(BSDMC) to transmit clear and accurate meaning of the digital media contents. We implement general-purpose components to display the video, picture, text and symbol simultaneously. Only plug-in and call these components with proper parameters on the application developing tool, we can easily develop the multimedia contents broadcasting system. These components are implemented based on the object-oriented framework and modular structure so that increase the reusability and can be develop other applications quick and reliable.