• Title/Summary/Keyword: Visual Scene

Search Result 369, Processing Time 0.027 seconds

Scene change detection using visual rhythm by direction (Visual Rhythm의 방향성을 이용한 장면변환 검출)

  • 윤상호;유지상
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1193-1202
    • /
    • 2004
  • While the management of digital contents is getting more and more important, many researchers have studied about scene change detection algorithms to reduce similar scenes in the video contents and to efficiently summarize video data. The algorithms using histogram and pixel information are found out as being sensitive to light changes and motion. Therefore, visual rhythm gets used in recent work to solve this problem, which shows some characteristics of scenes and requires even less computational power. In this paper, a new scene detection algorithm using visual rhythm by direction is proposed. The proposed algorithm needs less computational power and is able to keep good performance even in the scenes with motion. Experimental results show the performance improvement of about 30% comparing with conventional methods with histogram. They also show that the proposed algorithm is able to keep the same performance even to music video contents with lots of motion.

Three-Dimensional Photon Counting Imaging with Enhanced Visual Quality

  • Lee, Jaehoon;Lee, Min-Chul;Cho, Myungjin
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.180-187
    • /
    • 2021
  • In this paper, we present a computational volumetric reconstruction method for three-dimensional (3D) photon counting imaging with enhanced visual quality when low-resolution elemental images are used under photon-starved conditions. In conventional photon counting imaging with low-resolution elemental images, it may be difficult to estimate the 3D scene correctly because of a lack of scene information. In addition, the reconstructed 3D images may be blurred because volumetric computational reconstruction has an averaging effect. In contrast, with our method, the pixels of the elemental image rearrangement technique and a Bayesian approach are used as the reconstruction and estimation methods, respectively. Therefore, our method can enhance the visual quality and estimation accuracy of the reconstructed 3D images because it does not have an averaging effect and uses prior information about the 3D scene. To validate our technique, we performed optical experiments and demonstrated the reconstruction results.

Research on Scene Features of Mixed Reality Game Based on Spatial Perception-Focused on "The Fragment" Case Study

  • Li, Wei;Cho, Dong-Min
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.4
    • /
    • pp.601-609
    • /
    • 2021
  • This article combines literature and empirical research based on space perception theory and the case study of mixed reality game "The Fragment." It is concluded that the mixed reality scene under space perception has a three-level visual definition. This definition carries out a corresponding level analysis of the scenes of the "The Fragment" game and draws up the constituent factors of the mixed reality game scene characteristics. Finally, through questionnaire data investigation and analysis, it is verified that the three factors of virtual reality coexistence, human-computer interaction, and local serviceability can better explain the characteristics of mixed reality game scenes. At the end of the study, it is concluded that the definition of three levels of visual hierarchy and the constituent factors of mixed reality game scenes can provide reference and help for other mixed-reality game designs and a brief description of future research plans.

A Novel Two-Stage Training Method for Unbiased Scene Graph Generation via Distribution Alignment

  • Dongdong Jia;Meili Zhou;Wei WEI;Dong Wang;Zongwen Bai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3383-3397
    • /
    • 2023
  • Scene graphs serve as semantic abstractions of images and play a crucial role in enhancing visual comprehension and reasoning. However, the performance of Scene Graph Generation is often compromised when working with biased data in real-world situations. While many existing systems focus on a single stage of learning for both feature extraction and classification, some employ Class-Balancing strategies, such as Re-weighting, Data Resampling, and Transfer Learning from head to tail. In this paper, we propose a novel approach that decouples the feature extraction and classification phases of the scene graph generation process. For feature extraction, we leverage a transformer-based architecture and design an adaptive calibration function specifically for predicate classification. This function enables us to dynamically adjust the classification scores for each predicate category. Additionally, we introduce a Distribution Alignment technique that effectively balances the class distribution after the feature extraction phase reaches a stable state, thereby facilitating the retraining of the classification head. Importantly, our Distribution Alignment strategy is model-independent and does not require additional supervision, making it applicable to a wide range of SGG models. Using the scene graph diagnostic toolkit on Visual Genome and several popular models, we achieved significant improvements over the previous state-of-the-art methods with our model. Compared to the TDE model, our model improved mR@100 by 70.5% for PredCls, by 84.0% for SGCls, and by 97.6% for SGDet tasks.

A Study on Visual Mise-en-Scene of VR Animation (VR 애니메이션 의 시각적 미장센 연구)

  • Lee, Lang-Goo;Chung, Jean-Hun
    • Journal of Digital Convergence
    • /
    • v.15 no.9
    • /
    • pp.407-413
    • /
    • 2017
  • Mis-en-Scene is a direction method of image aesthetics for constructing screen and space. Mis-en-Scene is important factor not only in plays and movies, but also in animations, and it is a strong method to induce audience to immerse in the works and to continue the immersion. This study examined animation's Mis-en-Scene based on theories of Mis-en-Scene in movies, how Mis-en-Scene is directed and expressed in virtual spaces, and what factors and characteristics induce audience to immerse in the works and continue the immersion through analysis on visual Mis-en-Scene factors of a specific case, VR animation . It was found that as visual Mis-en-Scene factors, character and props, background, unique quality and friendliness of character, natural movement and acting, symbolism and utilization, and variety and consistency of background induce and sustain immersion. It is thought that this study would helpful for related areas based on the findings which suggest that there is a need for differentiated measure and method to catch audience's eyes and sustain immersion utilizing characteristics of vidual Mis-en-Scene factors in VR animation in the future.

Intermediate Data Structure for MPEG-4 Scene Description

  • Cha, Kyung-Ae;Kim, Hee-Sun;Kim, Sang-Wook
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2001.06a
    • /
    • pp.192-195
    • /
    • 2001
  • MPEG-4 content is streaming media that are composed of different types of media objects, organized in a hierarchical fashion. This paper proposes scene composition model for authoring MPEG-4 contents which can support object based interactions. And we have developed MPEG-4 contents authoring tool applied the proposed scene composition model as intermediate data structure. Particularly, for supporting interoperability between multimedia contents, the scene composition model should be used independent of file format. So visual scene composed of media objects on the from of scene composition tree can be transformed variable data format including BIFS, scene description from proposed by MPEG-4 standard and also support the extension of capability.

  • PDF

Detection of Abnormal Behavior by Scene Analysis in Surveillance Video (감시 영상에서의 장면 분석을 통한 이상행위 검출)

  • Bae, Gun-Tae;Uh, Young-Jung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.12C
    • /
    • pp.744-752
    • /
    • 2011
  • In intelligent surveillance system, various methods for detecting abnormal behavior were proposed recently. However, most researches are not robust enough to be utilized for actual reality which often has occlusions because of assumption the researches have that individual objects can be tracked. This paper presents a novel method to detect abnormal behavior by analysing major motion of the scene for complex environment in which object tracking cannot work. First, we generate Visual Word and Visual Document from motion information extracted from input video and process them through LDA(Latent Dirichlet Allocation) algorithm which is one of document analysis technique to obtain major motion information(location, magnitude, direction, distribution) of the scene. Using acquired information, we compare similarity between motion appeared in input video and analysed major motion in order to detect motions which does not match to major motions as abnormal behavior.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

  • Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
    • 한국정보컨버전스학회:학술대회논문집
    • /
    • 2008.06a
    • /
    • pp.53-56
    • /
    • 2008
  • This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.

  • PDF