• Title/Summary/Keyword: Scene Graph Generation

Search Result 8, Processing Time 0.022 seconds

Multimodal Context Embedding for Scene Graph Generation

  • Jung, Gayoung;Kim, Incheol
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1250-1260
    • /
    • 2020
  • This study proposes a novel deep neural network model that can accurately detect objects and their relationships in an image and represent them as a scene graph. The proposed model utilizes several multimodal features, including linguistic features and visual context features, to accurately detect objects and relationships. In addition, in the proposed model, context features are embedded using graph neural networks to depict the dependencies between two related objects in the context feature vector. This study demonstrates the effectiveness of the proposed model through comparative experiments using the Visual Genome benchmark dataset.

Deep Neural Network-Based Scene Graph Generation for 3D Simulated Indoor Environments (3차원 가상 실내 환경을 위한 심층 신경망 기반의 장면 그래프 생성)

  • Shin, Donghyeop;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.5
    • /
    • pp.205-212
    • /
    • 2019
  • Scene graph is a kind of knowledge graph that represents both objects and their relationships found in a image. This paper proposes a 3D scene graph generation model for three-dimensional indoor environments. An 3D scene graph includes not only object types, their positions and attributes, but also three-dimensional spatial relationships between them, An 3D scene graph can be viewed as a prior knowledge base describing the given environment within that the agent will be deployed later. Therefore, 3D scene graphs can be used in many useful applications, such as visual question answering (VQA) and service robots. This proposed 3D scene graph generation model consists of four sub-networks: object detection network (ObjNet), attribute prediction network (AttNet), transfer network (TransNet), relationship prediction network (RelNet). Conducting several experiments with 3D simulated indoor environments provided by AI2-THOR, we confirmed that the proposed model shows high performance.

A Novel Two-Stage Training Method for Unbiased Scene Graph Generation via Distribution Alignment

  • Dongdong Jia;Meili Zhou;Wei WEI;Dong Wang;Zongwen Bai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3383-3397
    • /
    • 2023
  • Scene graphs serve as semantic abstractions of images and play a crucial role in enhancing visual comprehension and reasoning. However, the performance of Scene Graph Generation is often compromised when working with biased data in real-world situations. While many existing systems focus on a single stage of learning for both feature extraction and classification, some employ Class-Balancing strategies, such as Re-weighting, Data Resampling, and Transfer Learning from head to tail. In this paper, we propose a novel approach that decouples the feature extraction and classification phases of the scene graph generation process. For feature extraction, we leverage a transformer-based architecture and design an adaptive calibration function specifically for predicate classification. This function enables us to dynamically adjust the classification scores for each predicate category. Additionally, we introduce a Distribution Alignment technique that effectively balances the class distribution after the feature extraction phase reaches a stable state, thereby facilitating the retraining of the classification head. Importantly, our Distribution Alignment strategy is model-independent and does not require additional supervision, making it applicable to a wide range of SGG models. Using the scene graph diagnostic toolkit on Visual Genome and several popular models, we achieved significant improvements over the previous state-of-the-art methods with our model. Compared to the TDE model, our model improved mR@100 by 70.5% for PredCls, by 84.0% for SGCls, and by 97.6% for SGDet tasks.

Geometric and Semantic Improvement for Unbiased Scene Graph Generation

  • Ruhui Zhang;Pengcheng Xu;Kang Kang;You Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2643-2657
    • /
    • 2023
  • Scene graphs are structured representations that can clearly convey objects and the relationships between them, but are often heavily biased due to the highly skewed, long-tailed relational labeling in the dataset. Indeed, the visual world itself and its descriptions are biased. Therefore, Unbiased Scene Graph Generation (USGG) prefers to train models to eliminate long-tail effects as much as possible, rather than altering the dataset directly. To this end, we propose Geometric and Semantic Improvement (GSI) for USGG to mitigate this issue. First, to fully exploit the feature information in the images, geometric dimension and semantic dimension enhancement modules are designed. The geometric module is designed from the perspective that the position information between neighboring object pairs will affect each other, which can improve the recall rate of the overall relationship in the dataset. The semantic module further processes the embedded word vector, which can enhance the acquisition of semantic information. Then, to improve the recall rate of the tail data, the Class Balanced Seesaw Loss (CBSLoss) is designed for the tail data. The recall rate of the prediction is improved by penalizing the body or tail relations that are judged incorrectly in the dataset. The experimental findings demonstrate that the GSI method performs better than mainstream models in terms of the mean Recall@K (mR@K) metric in three tasks. The long-tailed imbalance in the Visual Genome 150 (VG150) dataset is addressed better using the GSI method than by most of the existing methods.

Scene Graph Generation with Graph Neural Network and Multimodal Context (그래프 신경망과 멀티 모달 맥락 정보를 이용한 장면 그래프 생성)

  • Jung, Ga-Young;Kim, In-cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.555-558
    • /
    • 2020
  • 본 논문에서는 입력 영상에 담긴 다양한 물체들과 그들 간의 관계를 효과적으로 탐지하여, 하나의 장면 그래프로 표현해내는 새로운 심층 신경망 모델을 제안한다. 제안 모델에서는 물체와 관계의 효과적인 탐지를 위해, 합성 곱 신경망 기반의 시각 맥락 특징들뿐만 아니라 언어 맥락 특징들을 포함하는 다양한 멀티 모달 맥락 정보들을 활용한다. 또한, 제안 모델에서는 관계를 맺는 두 물체 간의 상호 의존성이 그래프 노드 특징값들에 충분히 반영되도록, 그래프 신경망을 이용해 맥락 정보를 임베딩한다. 본 논문에서는 Visual Genome 벤치마크 데이터 집합을 이용한 비교 실험들을 통해, 제안 모델의 효과와 성능을 입증한다.

Motion Visualization of a Vehicle Driver Based on Virtual Reality (가상현실 기반에서 차량 운전자 거동의 가시화)

  • Jeong, Yun-Seok;Son, Kwon;Choi, Kyung-Hyun
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.11 no.5
    • /
    • pp.201-209
    • /
    • 2003
  • Virtual human models are widely used to save time and expense in vehicle safety studies. A human model is an essential tool to visualize and simulate a vehicle driver in virtual environments. This research is focused on creation and application of a human model fer virtual reality. The Korean anthropometric data published are selected to determine basic human model dimensions. These data are applied to GEBOD, a human body data generation program, which computes the body segment geometry, mass properties, joints locations and mechanical properties. The human model was constituted using MADYMO based on data from GEBOD. Frontal crash and bump passing test were simulated and the driver's motion data calculated were transmitted into the virtual environment. The human model was organized into scene graphs and its motion was visualized by virtual reality techniques including OpenGL Performer. The human model can be controlled by an arm master to test driver's behavior in the virtual environment.

Scene Graph Generation by Exploration of Agent in Three-Dimensional Space (3차원 공간에서 에이전트의 탐색을 통한 장면 그래프 생성)

  • Shin, Donghyeop;Kim, Incheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.742-745
    • /
    • 2018
  • 장면 그래프는 영상 내 물체들의 정보를 나타내는 지식 그래프이다. 본 논문에서는 3차원 공간에서 에이전트의 탐색을 통해, 장면 그래프를 생성하는 모델을 제안한다. 3차원 공간에 대한 장면 그래프는 물체들의 위치, 종류, 속성뿐만 아니라 물체들 간의 관계 정보를 포함한다. 이에 따라 장면 그래프는 다양한 문제 해결에 기초 데이터로써 활용될 수 있다. 본 논문은 장면 그래프를 생성하기 위해 필요한 기능들을 정의하고, 기능에 따라 4가지 부분 네트워크들을 제안한다. 또한 각 부분 네트워크들의 학습 및 성능 평가를 위해, 3차원 실내 가상환경인 AI2-THOR에서 데이터들을 수집하였고, 다양한 실험을 통해 각 부분 네트워크들의 성능을 검증하였다.

A Study of 3D Sound Modeling based on Geometric Acoustics Techniques for Virtual Reality (가상현실 환경에서 기하학적 음향 기술 기반의 3차원 사운드 모델링 기술에 관한 연구)

  • Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.4
    • /
    • pp.102-106
    • /
    • 2016
  • With the popularity of smart phones and the help of high-speed wireless communication technology, high-quality multimedia contents have become common in mobile devices. Especially, the release of Oculus Rift opens a new era of virtual reality technology in consumer market. At the same time, 3D audio technology which is currently used to make computer games more realistic will soon be applied to the next generation of mobile phone and expected to offer a more expansive experience than its visual counterpart. This paper surveys concepts, algorithms, and systems for modeling 3D sound virtual environment applications. To do this, we first introduce an important design principle for audio rendering based on physics-based geometric algorithms and multichannel technologies, and introduce an audio rendering pipeline to a scene graph-based virtual reality system and a hardware architecture to model sound propagation.