• Title/Summary/Keyword: Visual Scene

Search Result 370, Processing Time 0.024 seconds

Deep Neural Network-Based Scene Graph Generation for 3D Simulated Indoor Environments (3차원 가상 실내 환경을 위한 심층 신경망 기반의 장면 그래프 생성)

  • Shin, Donghyeop;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.5
    • /
    • pp.205-212
    • /
    • 2019
  • Scene graph is a kind of knowledge graph that represents both objects and their relationships found in a image. This paper proposes a 3D scene graph generation model for three-dimensional indoor environments. An 3D scene graph includes not only object types, their positions and attributes, but also three-dimensional spatial relationships between them, An 3D scene graph can be viewed as a prior knowledge base describing the given environment within that the agent will be deployed later. Therefore, 3D scene graphs can be used in many useful applications, such as visual question answering (VQA) and service robots. This proposed 3D scene graph generation model consists of four sub-networks: object detection network (ObjNet), attribute prediction network (AttNet), transfer network (TransNet), relationship prediction network (RelNet). Conducting several experiments with 3D simulated indoor environments provided by AI2-THOR, we confirmed that the proposed model shows high performance.

Video Retrieval System supporting Content-based Retrieval and Scene-Query-By-Example Retrieval (비디오의 의미검색과 예제기반 장면검색을 위한 비디오 검색시스템)

  • Yoon, Mi-Hee;Cho, Dong-Uk
    • The KIPS Transactions:PartB
    • /
    • v.9B no.1
    • /
    • pp.105-112
    • /
    • 2002
  • In order to process video data effectively, we need to save its content on database and a content-based retrieval method which processes various queries of all users is required. In this paper, we present VRS(Video Retrieval System) which provides similarity query, SQBE(Scene Query By Example) query, and content-based retrieval by combining the feature-based retrieval and the annotation-based retrieval. The SQBE query makes it possible for a user to retrieve scones more exactly by inserting and deleting objects based on a retrieved scene. We proposed query language and query processing algorithm for SQBE query, and carried out performance evaluation on similarity retrieval. The proposed system is implemented with Visual C++ and Oracle.

Scene Change Detection and Key Frame Selection Using Fast Feature Extraction in the MPEG-Compressed Domain (MPEG 압축 영상에서의 고속 특징 요소 추출을 이용한 장면 전환 검출과 키 프레임 선택)

  • 송병철;김명준;나종범
    • Journal of Broadcast Engineering
    • /
    • v.4 no.2
    • /
    • pp.155-163
    • /
    • 1999
  • In this paper, we propose novel scene change detection and key frame selection techniques, which use two feature images, i.e., DC and edge images, extracted directly from MPEG compressed video. For fast edge image extraction. we suggest to utilize 5 lower AC coefficients of each DCT. Based on this scheme, we present another edge image extraction technique using AC prediction. Although the former is superior to the latter in terms of visual quality, both methods all can extract important edge features well. Simulation results indicate that scene changes such as cut. fades, and dissolves can be correctly detected by using the edge energy diagram obtained from edge images and histograms from DC images. In addition. we find that our edge images are comparable to those obtained in the spatial domain while keeping much lower computational cost. And based on HVS, a key frame of each scene can also be selected. In comparison with an existing method using optical flow. our scheme can select semantic key frames because we only use the above edge and DC images.

  • PDF

Geometric and Semantic Improvement for Unbiased Scene Graph Generation

  • Ruhui Zhang;Pengcheng Xu;Kang Kang;You Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2643-2657
    • /
    • 2023
  • Scene graphs are structured representations that can clearly convey objects and the relationships between them, but are often heavily biased due to the highly skewed, long-tailed relational labeling in the dataset. Indeed, the visual world itself and its descriptions are biased. Therefore, Unbiased Scene Graph Generation (USGG) prefers to train models to eliminate long-tail effects as much as possible, rather than altering the dataset directly. To this end, we propose Geometric and Semantic Improvement (GSI) for USGG to mitigate this issue. First, to fully exploit the feature information in the images, geometric dimension and semantic dimension enhancement modules are designed. The geometric module is designed from the perspective that the position information between neighboring object pairs will affect each other, which can improve the recall rate of the overall relationship in the dataset. The semantic module further processes the embedded word vector, which can enhance the acquisition of semantic information. Then, to improve the recall rate of the tail data, the Class Balanced Seesaw Loss (CBSLoss) is designed for the tail data. The recall rate of the prediction is improved by penalizing the body or tail relations that are judged incorrectly in the dataset. The experimental findings demonstrate that the GSI method performs better than mainstream models in terms of the mean Recall@K (mR@K) metric in three tasks. The long-tailed imbalance in the Visual Genome 150 (VG150) dataset is addressed better using the GSI method than by most of the existing methods.

Semantic Visual Place Recognition in Dynamic Urban Environment (동적 도시 환경에서 의미론적 시각적 장소 인식)

  • Arshad, Saba;Kim, Gon-Woo
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.3
    • /
    • pp.334-338
    • /
    • 2022
  • In visual simultaneous localization and mapping (vSLAM), the correct recognition of a place benefits in relocalization and improved map accuracy. However, its performance is significantly affected by the environmental conditions such as variation in light, viewpoints, seasons, and presence of dynamic objects. This research addresses the problem of feature occlusion caused by interference of dynamic objects leading to the poor performance of visual place recognition algorithm. To overcome the aforementioned problem, this research analyzes the role of scene semantics in correct detection of a place in challenging environments and presents a semantics aided visual place recognition method. Semantics being invariant to viewpoint changes and dynamic environment can improve the overall performance of the place matching method. The proposed method is evaluated on the two benchmark datasets with dynamic environment and seasonal changes. Experimental results show the improved performance of the visual place recognition method for vSLAM.

Video Content Manipulation Using 3D Analysis for MPEG-4

  • Sull, Sanghoon
    • Journal of Broadcast Engineering
    • /
    • v.2 no.2
    • /
    • pp.125-135
    • /
    • 1997
  • This paper is concerned with realistic mainpulation of content in video sequences. Manipulation of content in video sequences is one of the content-based functionalities for MPEG-4 Visual standard. We present an approach to synthesizing video sequences by using the intermediate outputs of three-dimensional (3D) motion and depth analysis. For concreteness, we focus on video showing 3D motion of an observer relative to a scene containing planar runways (or roads). We first present a simple runway (or road) model. Then, we describe a method of identifying the runway (or road) boundary in the image using the Point of Heading Direction (PHD) which is defined as the image of, the ray along which a camera moves. The 3D motion of the camera is obtained from one of the existing 3D analysis methods. Then, a video sequence containing a runway is manipulated by (i) coloring the scene part above a vanishing line, say blue, to show sky, (ii) filling in the occluded scene parts, and (iii) overlaying the identified runway edges and placing yellow disks in them, simulating lights. Experimental results for a real video sequence are presented.

  • PDF

The Implementing a Color, Edge, Optical Flow based on Mixed Algorithm for Shot Boundary Improvement (샷 경계검출 개선을 위한 칼라, 엣지, 옵티컬플로우 기반의 혼합형 알고리즘 구현)

  • Park, Seo Rin;Lim, Yang Mi
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.829-836
    • /
    • 2018
  • This study attempts to detect a shot boundary in films(or dramas) based on the length of a sequence. As films or dramas use scene change effects a lot, the issues regarding the effects are more diverse than those used in surveillance cameras, sports videos, medical care and security. Visual techniques used in films are focused on the human sense of aesthetic therefore, it is difficult to solve the errors in shot boundary detection with the method employed in surveillance cameras. In order to define the errors arisen from the scene change effects between the images and resolve those issues, the mixed algorithm based upon color histogram, edge histogram, and optical flow was implemented. The shot boundary data from this study will be used when analysing the configuration of meaningful shots in sequences in the future.

An Automatic Scene Background Classification Scheme for Sitcom Videos Using MPEG-7 Visual (시트콤 동영상에서 MPEG-7 시각 기술자를 이용한 Scene 배경의 자동 분류 방법)

  • 전재욱;손대온;낭종호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.505-507
    • /
    • 2004
  • 시트콤 동염상은 고정된 배경을 갖는 중 아웃에 연이어 오는 줌 인으로 구성되어 있고, 또한 활영되는 배경의 수는 한정되어 있는 특성이 때문에, 이러한 배경의 시각적 특성을 사용하여 배경들을 학습시키고 자동으로 분리시킬 수 있다. 본 논문에서는 신경망의 일종인 LVQ[1]를 사용하여 이러한 증류의 비디오 동영상에 대한 자동 배경 분류 방법을 제안한다. 우선, MPEG-7 시각 기술자를 이용하여 신(scene) 배경의 시각적인 특성을 추출하고 이러한 시각적 특성을 미리 제작자에 의해서 주어진 배경 점보로서 LVQ를 학습시킨다. 학습이 진행되면서 특정 배경의 시각적 특성은 LVQ의 가중치로서 표현되며, 다른 배경을 자동으로 분류하는데 사용된다 제안된 LVQ기반의 분류 방법을 사용한 두 종류의 시트콤 동영상에 대한 실험 결과는 분류에 대한 어떠한 하드코딩 없이 80-90%의 정확도로 시트콤 동영상의 배경을 자동으로 분류한다.

  • PDF

The Influence of Sensory Interference Arising from View-Height Differences on Visual Short-Term Memory Performance (조망 높이의 차이가 초래한 감각적 간섭이 시각단기기억 수행에 미치는 영향)

  • Ka, Yaguem;Hyun, Joo-Seok
    • Science of Emotion and Sensibility
    • /
    • v.23 no.1
    • /
    • pp.17-28
    • /
    • 2020
  • Lowering observers' view-height may increase the amount of occlusion across objects in a visual scene and prevent the accurate identification of the objects in the scene. Based on this possibility, memory stimuli in relation to their expected views from different heights were displayed in this study. Thereafter, visual short-term memory (VSTM) performance for the stimuli was measured. In Experiment 1, the memory stimuli were presented on a grid-background drawn according to linear perspectives, which varied across observers' three different view-heights (high, middle, and low). This allowed the participants to remember both the color and position of each memory stimulus. The results revealed that testing participants' VSTM performance for the stimuli under a different memory load of two set-sizes (3 vs. 6) demonstrated an evident drop of performance in the lowest view-height condition. In Experiment 2, the performance for six stimuli with or without the grid-background was tested. A similar pattern of performance drop in the lowest condition as in Experiment 1 was found. These results indicated that different view-heights of an observer can change the amount of occlusion across objects in the visual field, and the sensory interference driven by the occlusion may further influence VSTM performance for those objects.

A Study on the code and design elements as a way of transition (애니메이션 화면 전환 수단으로서의 조형 요소 변화에 대한 연구)

  • Kim, Jean-Young
    • Cartoon and Animation Studies
    • /
    • s.14
    • /
    • pp.83-99
    • /
    • 2008
  • In general, the cut or dissolve or etc., collective changeover represents the change of scene in the film. Animation film makes scene's various parts to allow intended sensibility and narrative factors by special manufacturing skill generating frame image one by one and transfer it to the different symbolic dimensional expression. Nowadays sequential scene composition is not any more the unique special treatment for 2D animation according to image handling skill like morphing, metamorphosis, etc. becomes diverse and elaborate. But 2D manual animation makes spectator to be absorbed into different visual dimensions continuously and strongly beyond character and background, namely object and space. that is 2D manual animation's strong attractiveness. Finally these characteristics enable literary function which makes it possible to do delicate metaphorical through full scene composition basis and to communicate a implicative meaning system The analysis about scene broke boundary between symbolic perspective world and plane formative world and it became more diverse and complicated. Hereupon the analyzing the composition basis of formative element in the animation film scene and it's application effect make it helpful to analysis and application in the modern image scene having new absorbing methods

  • PDF