• Title/Summary/Keyword: Audio and Video

Search Result 805, Processing Time 0.03 seconds

A Precise Audio/Video Synchronization Scheme Based on RTP Packet for Multimedia Communication (멀티미디어 통신을 위한 RTP 패킷 기반의 정밀한 오디오/비디오 동기화 기법)

  • Seo, Kwang-Deok;Chi, Won-Sup;Jung, Soon-Heung
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.5
    • /
    • pp.653-663
    • /
    • 2009
  • Synchronization between media is an important aspect in the design of multimedia communication-system. This paper proposes a precise media synchronization mechanism for video and audio transport over IP networks. To support synchronization between video and audio bitstreams transported over IP networks, RTP/RTCP protocol suite is usually employed. To provide a precise mechanism for media synchronization between video and audio, we suggest an efficient media synchronization algorithm based on NPT (Normal Play Time) which can be derivable from the timestamp information in the header part of RTP packet generated for the transport of video and audio. In the proposed method, we do not need to send and process any RTCP SR (sender report) packet which is required for conventional media synchronization scheme, and accordingly could reduce the number of required UDP ports and the amount of control traffic injected into the network.

  • PDF

Caption Data Transmission Method for HDTV Picture Quality Improvement (DTV 화질향상을 위한 자막데이터 전송방법)

  • Han, Chan-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.10
    • /
    • pp.1628-1636
    • /
    • 2017
  • Such as closed caption, ancillary data, electronic program guide(EPG), data broadcasting, and etc, increased data for service convenience cause to degrade video quality of high definition contents. This article propose a method to transfer the closed caption data of video contents without video quality degradation. Video quality degradation does not cause in video compression by the block image insertion of caption data in DTV essential hidden area. Additionally the proposed methods have advantage to synchronize video, audio, and caption from preinserted script without time delay.

Dimension-Reduced Audio Spectrum Projection Features for Classifying Video Sound Clips

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.3E
    • /
    • pp.89-94
    • /
    • 2006
  • For audio indexing and targeted search of specific audio or corresponding visual contents, the MPEG-7 standard has adopted a sound classification framework, in which dimension-reduced Audio Spectrum Projection (ASP) features are used to train continuous hidden Markov models (HMMs) for classification of various sounds. The MPEG-7 employs Principal Component Analysis (PCA) or Independent Component Analysis (ICA) for the dimensional reduction. Other well-established techniques include Non-negative Matrix Factorization (NMF), Linear Discriminant Analysis (LDA) and Discrete Cosine Transformation (DCT). In this paper we compare the performance of different dimensional reduction methods with Gaussian mixture models (GMMs) and HMMs in the classifying video sound clips.

Automatic Indexing Algorithm of Golf Video Using Audio Information (오디오 정보를 이용한 골프 동영상 자동 색인 알고리즘)

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.441-446
    • /
    • 2009
  • This paper proposes an automatic indexing algorithm of golf video using audio information. In the proposed algorithm, the input audio stream is demultiplexed into the stream of video and audio. By means of Adaboost-cascade classifier, the continuous audio stream is classified into announcer's speech segment recorded in studio, music segment accompanied with players' names on TV screen, reaction segment of audience according to the play, reporter's speech segment with field background, filed noise segment like wind or waves. And golf swing sound including drive shot, iron shot, and putting shot is detected by the method of impulse onset detection and modulation spectrum verification. The detected swing and applause are used effectively to index action or highlight unit. Compared with video based semantic analysis, main advantage of the proposed system is its small computation requirement so that it facilitates to apply the technology to embedded consumer electronic devices for fast browsing.

A study on Metadata Modeling using Structure Information of Video Document (비디오 문서의 구조 정보를 이용한 메타데이터 모델링에 관한 연구)

  • 권재길
    • Journal of the Korea Society of Computer and Information
    • /
    • v.3 no.4
    • /
    • pp.10-18
    • /
    • 1998
  • Video information is an important component of multimedia system such as Digital Library. World-Wide Web(WWW) and Video-On-Demand(VOD) service system. It can support various types of information because of including audio-visual, spatial-temporal and semantics information. In addition, it requires the ability of retrieving the specific scene of video instead of entire retrieval of video document. Therefore, so as to support a variety of retrieval, this paper models metadata using video document structure information that consists of hierarchical structure, and designs database schema that can manipulate video document.

  • PDF

Implementation of Video Mirroring System based on IP

  • Lee, Seungwon;Kwon, Soonchul;Lee, Seunghyun
    • International journal of advanced smart convergence
    • /
    • v.11 no.2
    • /
    • pp.108-117
    • /
    • 2022
  • The recent development of information and communication technology has a great impact on the audio/video industry. In particular, IP-based AoIP transmission technology and AVB technology are making changes in the audio/video market. Video signal transmission technology has been introduced to the market through a network, but it has not replaced the video switcher function. Video signals in the conference room or classroom are still controlled by the switching device. In order to switch input/output video devices, a cable that is not limited by distance must be connected to the switcher. In addition, the control of the switching device must be performed by a person who has received professional training. In this paper, it is a technology that can be operated even by non-experts by replacing complex video cables (RGB, DVI, HDMI, DP) with LAN cables and enabling IP-based video switching and transmission (Video Mirroring over IP: VMoIP) to replace video switcher equipment. We are going to do this study, I/O videos were controlled in the form of matrix and high-definition videos were transmitted without distortion, and VMoIP is expected to become the standard for video switching systems in the future.

Development of Emotion Recognition Model Using Audio-video Feature Extraction Multimodal Model (음성-영상 특징 추출 멀티모달 모델을 이용한 감정 인식 모델 개발)

  • Jong-Gu Kim;Jang-Woo Kwon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.221-228
    • /
    • 2023
  • Physical and mental changes caused by emotions can affect various behaviors, such as driving or learning behavior. Therefore, recognizing these emotions is a very important task because it can be used in various industries, such as recognizing and controlling dangerous emotions while driving. In this paper, we attempted to solve the emotion recognition task by implementing a multimodal model that recognizes emotions using both audio and video data from different domains. After extracting voice from video data using RAVDESS data, features of voice data are extracted through a model using 2D-CNN. In addition, the video data features are extracted using a slowfast feature extractor. And the information contained in the audio and video data, which have different domains, are combined into one feature that contains all the information. Afterwards, emotion recognition is performed using the combined features. Lastly, we evaluate the conventional methods that how to combine results from models and how to vote two model's results and a method of unifying the domain through feature extraction, then combining the features and performing classification using a classifier.

Multi-modal Detection of Anchor Shot in News Video (다중모드 특징을 사용한 뉴스 동영상의 앵커 장면 검출 기법)

  • Yoo, Sung-Yul;Kang, Dong-Wook;Kim, Ki-Doo;Jung, Kyeong-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.12 no.4
    • /
    • pp.311-320
    • /
    • 2007
  • In this paper, an efficient detection algorithm of an anchor shot in news video is presented. We observed the audio visual characteristics of news video and proposed several low level features which are appropriate for detecting an anchor shot in news video. The overall structure of the proposed algorithm is composed of 3 stages: the pause detection, the audio cluster classification, and the matching with motion activity stage. We used the audio features as well as the motion feature in order to improve the indexing accuracy and the simulation results show that the performance of the proposed algorithm is quite satisfactory.

Bandwidth enhancement scheme for VoIP application based on H.323 (H.323 기반 VoIP 어플리케이션에서의 대역폭 향상을 위한 방법)

  • 김기훈;박동선;이승상;박종빈
    • Proceedings of the IEEK Conference
    • /
    • 2003.11c
    • /
    • pp.149-152
    • /
    • 2003
  • In this paper, we propose a scheme that applies to the VoIP application based on H.323 protocol to enhance the bandwidth efficiency. We multiplex the audio and video stream. In this scheme, audio frame is carried with video stream. And we applies not only multiplexing but also (in header compressing to the real audio/video stream to increase the bandwidth efficiency. With the multiplexing and RTP header compressing, we gain the bandwidth efficiency. In the finite network environment, We can assign bandwidth to other users who want to use other service. and other VoIP users. If we can apply the real time network situation to the our VoIP application, we can get more efficient performance.

  • PDF

Classification of Phornographic Videos Based on the Audio Information (오디오 신호에 기반한 음란 동영상 판별)

  • Kim, Bong-Wan;Choi, Dae-Lim;Lee, Yong-Ju
    • MALSORI
    • /
    • no.63
    • /
    • pp.139-151
    • /
    • 2007
  • As the Internet becomes prevalent in our lives, harmful contents, such as phornographic videos, have been increasing on the Internet, which has become a very serious problem. To prevent such an event, there are many filtering systems mainly based on the keyword-or image-based methods. The main purpose of this paper is to devise a system that classifies pornographic videos based on the audio information. We use the mel-cepstrum modulation energy (MCME) which is a modulation energy calculated on the time trajectory of the mel-frequency cepstral coefficients (MFCC) as well as the MFCC as the feature vector. For the classifier, we use the well-known Gaussian mixture model (GMM). The experimental results showed that the proposed system effectively classified 98.3% of pornographic data and 99.8% of non-pornographic data. We expect the proposed method can be applied to the more accurate classification system which uses both video and audio information.

  • PDF