• Title/Summary/Keyword: audio visual

Search Result 426, Processing Time 0.031 seconds

Design and Implementation of Interworking Gateway with QoS Adaptation (QoS 적응 기능을 갖는 연동 게이트웨이의 설계 및 구현)

  • Song, Byeong-Hun;Choe, Sang-Gi;Jeong, Gwang-Su
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.5
    • /
    • pp.619-627
    • /
    • 1999
  • To support multimedia services between network domains with different environments, it is required to map the functionalities in many aspects. In this paper, we implemented interworking gateway which provides protocol conversion and QoS(Quality of Service) adaptation to interwork DAVIC services based on ATM(Asynchronous TRansfer Model )network and Internet AV services. The interworking gateway converts RTSP(Real-Time Streaming Protocol ) message into DSM-CC(Digital Storage Media Command & Control) messages to control the stream that is served in ATM network, and transmits data stream by using RTP(Real-Time Transport Protocol) The interworking gateway provides QoS adaptation functionalities by QoS monitoring and MPEG filtering to meet the variation of network bandwidth.

A Real-time Pigsty Monitoring System Based on Audio/Visual Sensors (A/V 센서 기반의 실시간 돈사 모니터링 시스템)

  • Oh, Seunggeun;In, Kyeongjun;Chung, Yongwha;Chang, Hong-Hee;Park, Daihee
    • Annual Conference of KIPS
    • /
    • 2012.11a
    • /
    • pp.1162-1165
    • /
    • 2012
  • 어미로부터 생후 21일령(또는 28일령)에 젖을 때는 어린 자돈들은 면역력이 약하여 통상 폐사율이 30~40%까지 치솟는 등 자돈 관리가 국내 양돈 농가의 가장 큰 문제 중 하나로 인식되고 있다. 본 논문에서는 이러한 양돈 농가의 문제를 해결하기 위하여 자돈사(새끼돼지 축사)에 카메라와 마이크를 설치하고 획득된 영상과 소리 정보를 이용하여 자돈들을 모니터링하는 시스템을 제안한다. 제안된 시스템은 실시간으로 유입되는 영상과 소리 스트림 데이터로부터 각각 움직임 벡터와 평균 피치 값을 추출하여 이미 설정된 정상 상황의 임계치 값을 넘는 순간부터를 불특정 이상 상황이라 판단한다. 실제, 경상남도 함양군의 한 돼지 농장에 A/V 센서 기반의 실험 환경을 구축하고 2012년 6월 한 달간의 이유자돈 돈사의 모니터링 데이터 셋을 취득하였고 전반기 15일간의 데이터 셋을 이용하여 자돈사 모니터링 시스템의 프로토타입을 설계 구현하였으며 후반기 15일간의 A/V 스트림 데이터로는 검증 실험을 수행하였다.

The Influence of Topic Exploration and Topic Relevance On Amplitudes of Endogenous ERP Components in Real-Time Video Watching (실시간 동영상 시청시 주제탐색조건과 주제관련성이 내재적 유발전위 활성에 미치는 영향)

  • Kim, Yong Ho;Kim, Hyun Hee
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.8
    • /
    • pp.874-886
    • /
    • 2019
  • To delve into the semantic gap problem of the automatic video summarization, we focused on an endogenous ERP responses at around 400ms and 600ms after the on-set of audio-visual stimulus. Our experiment included two factors: the topic exploration of experimental conditions (Topic Given vs. Topic Exploring) as a between-subject factor and the topic relevance of the shots (Topic-Relevant vs. Topic-Irrelevant) as a within-subject factor. For the Topic Given condition of 22 subjects, 6 short historical documentaries were shown with their video titles and written summaries, while in the Topic Exploring condition of 25 subjects, they were asked instead to explore topics of the same videos with no given information. EEG data were gathered while they were watching videos in real time. It was hypothesized that the cognitive activities to explore topics of videos while watching individual shots increase the amplitude of endogenous ERP at around 600 ms after the onset of topic relevant shots. The amplitude of endogenous ERP at around 400ms after the onset of topic-irrelevant shots was hypothesized to be lower in the Topic Given condition than that in the Topic Exploring condition. The repeated measure MANOVA test revealed that two hypotheses were acceptable.

Incomplete Cholesky Decomposition based Kernel Cross Modal Factor Analysis for Audiovisual Continuous Dimensional Emotion Recognition

  • Li, Xia;Lu, Guanming;Yan, Jingjie;Li, Haibo;Zhang, Zhengyan;Sun, Ning;Xie, Shipeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.810-831
    • /
    • 2019
  • Recently, continuous dimensional emotion recognition from audiovisual clues has attracted increasing attention in both theory and in practice. The large amount of data involved in the recognition processing decreases the efficiency of most bimodal information fusion algorithms. A novel algorithm, namely the incomplete Cholesky decomposition based kernel cross factor analysis (ICDKCFA), is presented and employed for continuous dimensional audiovisual emotion recognition, in this paper. After the ICDKCFA feature transformation, two basic fusion strategies, namely feature-level fusion and decision-level fusion, are explored to combine the transformed visual and audio features for emotion recognition. Finally, extensive experiments are conducted to evaluate the ICDKCFA approach on the AVEC 2016 Multimodal Affect Recognition Sub-Challenge dataset. The experimental results show that the ICDKCFA method has a higher speed than the original kernel cross factor analysis with the comparable performance. Moreover, the ICDKCFA method achieves a better performance than other common information fusion methods, such as the Canonical correlation analysis, kernel canonical correlation analysis and cross-modal factor analysis based fusion methods.

CNN-based Visual/Auditory Feature Fusion Method with Frame Selection for Classifying Video Events

  • Choe, Giseok;Lee, Seungbin;Nang, Jongho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1689-1701
    • /
    • 2019
  • In recent years, personal videos have been shared online due to the popular uses of portable devices, such as smartphones and action cameras. A recent report predicted that 80% of the Internet traffic will be video content by the year 2021. Several studies have been conducted on the detection of main video events to manage a large scale of videos. These studies show fairly good performance in certain genres. However, the methods used in previous studies have difficulty in detecting events of personal video. This is because the characteristics and genres of personal videos vary widely. In a research, we found that adding a dataset with the right perspective in the study improved performance. It has also been shown that performance improves depending on how you extract keyframes from the video. we selected frame segments that can represent video considering the characteristics of this personal video. In each frame segment, object, location, food and audio features were extracted, and representative vectors were generated through a CNN-based recurrent model and a fusion module. The proposed method showed mAP 78.4% performance through experiments using LSVC data.

Design and Implementation of Smart Pen based User Interface System for U-learning (U-Learning 을 위한 스마트펜 인터페이스 시스템 디자인 및 개발)

  • Shim, Jae-Youen;Kim, Seong-Whan
    • Annual Conference of KIPS
    • /
    • 2010.11a
    • /
    • pp.1388-1391
    • /
    • 2010
  • In this paper, we present a design and implementation of U-learning system using pen based augmented reality approach. Student has been given a smart pen and a smart study book, which is similar to the printed material already serviced. However, we print the study book using CMY inks, and embed perceptually invisible dot patterns using K ink. Smart pen includes (1) IR LED for illumination, IR pass filter for extracting the dot patterns, and (3) camera for image captures. From the image sequences, we perform topology analysis which determines the topological distance between dot pixels, and perform error correction decoding using four position symbols and five CRC symbols. When a student touches a smart study books with our smart pen, we show him/her multimedia (visual/audio) information which is exactly related with the selected region. Our scheme can embed 16 bit information, which is more than 200% larger than previous scheme, which supports 7 bits or 8 bits information.

The Use of Graphic Novels for Developing Multiliteracies (그래픽노블을 통한 다중문식성의 발달)

  • Yun, Eunja
    • Journal of English Language & Literature
    • /
    • v.56 no.4
    • /
    • pp.575-596
    • /
    • 2010
  • The modes of narratives and communication have expanded due to social and cultural changes and technological development. Thus texts have become multimodal and media hybridities and media crossover have been increasing as well. Multimodality requires new literacy to understand and interpret those multimodal texts other than existing traditional literacy approaches. The New London Group (2000) argues that multiliteracies are needed to serve today's changing multimodal texts. Kress (2003) also argues, visual texts have been prevailing, being mingled with other modes of texts such as linguistic, audio, gestural, and spatial modes. Literary texts are not exception in this trend of multimodality. The recent renaissance of comics, in particular, the new light on graphic novels can be interpreted in this historical vein. In comparison to comics, no consensus has been made in defining graphic novels, however, many studies have been recently conducted in order to look into the potential of graphic novels in building multiliteracies. In this paper, the graphic novel as a literary genre are explored from a histocial perspective and the definition of graphic novels was attempted to be made. In the light of multiliteracies, this paper presented cases that show how graphic novels can be utilized to build multiliteracies. Lastly, the use of graphic novels for English as a foreign language was introduced as well. The author hopes that at the age of multimodality, the potential graphic novels have in language and literacy education can be taken into account by language teachers and students in expanding their territory of literacy.

Audio-Visual Scene Aware Dialogue System Utilizing Action From Vision and Language Features (이미지-텍스트 자질을 이용한 행동 포착 비디오 기반 대화시스템)

  • Jungwoo Lim;Yoonna Jang;Junyoung Son;Seungyoon Lee;Kinam Park;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.253-257
    • /
    • 2023
  • 최근 다양한 대화 시스템이 스마트폰 어시스턴트, 자동 차 내비게이션, 음성 제어 스피커, 인간 중심 로봇 등의 실세계 인간-기계 인터페이스에 적용되고 있다. 하지만 대부분의 대화 시스템은 텍스트 기반으로 작동해 다중 모달리티 입력을 처리할 수 없다. 이 문제를 해결하기 위해서는 비디오와 같은 다중 모달리티 장면 인식을 통합한 대화 시스템이 필요하다. 기존의 비디오 기반 대화 시스템은 주로 시각, 이미지, 오디오 등의 다양한 자질을 합성하거나 사전 학습을 통해 이미지와 텍스트를 잘 정렬하는 데에만 집중하여 중요한 행동 단서와 소리 단서를 놓치고 있다는 한계가 존재한다. 본 논문은 이미지-텍스트 정렬의 사전학습 임베딩과 행동 단서, 소리 단서를 활용해 비디오 기반 대화 시스템을 개선한다. 제안한 모델은 텍스트와 이미지, 그리고 오디오 임베딩을 인코딩하고, 이를 바탕으로 관련 프레임과 행동 단서를 추출하여 발화를 생성하는 과정을 거친다. AVSD 데이터셋에서의 실험 결과, 제안한 모델이 기존의 모델보다 높은 성능을 보였으며, 대표적인 이미지-텍스트 자질들을 비디오 기반 대화시스템에서 비교 분석하였다.

  • PDF

A Research of User Experience on Multi-Modal Interactive Digital Art

  • Qianqian Jiang;Jeanhun Chung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.80-85
    • /
    • 2024
  • The concept of single-modal digital art originated in the 20th century and has evolved through three key stages. Over time, digital art has transformed into multi-modal interaction, representing a new era in art forms. Based on multi-modal theory, this paper aims to explore the characteristics of interactive digital art in innovative art forms and its impact on user experience. Through an analysis of practical application of multi-modal interactive digital art, this study summarises the impact of creative models of digital art on the physical and mental aspects of user experience. In creating audio-visual-based art, multi-modal digital art should seamlessly incorporate sensory elements and leverage computer image processing technology. Focusing on user perception, emotional expression, and cultural communication, it strives to establish an immersive environment with user experience at its core. Future research, particularly with emerging technologies like Artificial Intelligence(AR) and Virtual Reality(VR), should not merely prioritize technology but aim for meaningful interaction. Through multi-modal interaction, digital art is poised to continually innovate, offering new possibilities and expanding the realm of interactive digital art.

Design and Implementation of a Real-Time Lipreading System Using PCA & HMM (PCA와 HMM을 이용한 실시간 립리딩 시스템의 설계 및 구현)

  • Lee chi-geun;Lee eun-suk;Jung sung-tae;Lee sang-seol
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1597-1609
    • /
    • 2004
  • A lot of lipreading system has been proposed to compensate the rate of speech recognition dropped in a noisy environment. Previous lipreading systems work on some specific conditions such as artificial lighting and predefined background color. In this paper, we propose a real-time lipreading system which allows the motion of a speaker and relaxes the restriction on the condition for color and lighting. The proposed system extracts face and lip region from input video sequence captured with a common PC camera and essential visual information in real-time. It recognizes utterance words by using the visual information in real-time. It uses the hue histogram model to extract face and lip region. It uses mean shift algorithm to track the face of a moving speaker. It uses PCA(Principal Component Analysis) to extract the visual information for learning and testing. Also, it uses HMM(Hidden Markov Model) as a recognition algorithm. The experimental results show that our system could get the recognition rate of 90% in case of speaker dependent lipreading and increase the rate of speech recognition up to 40~85% according to the noise level when it is combined with audio speech recognition.

  • PDF