• Title/Summary/Keyword: Visual Scene Understanding

Search Result 29, Processing Time 0.021 seconds

Trends in Video Visual Relationship Understanding (비디오 시각적 관계 이해 기술 동향)

  • Y.J. Kwon;D.H. Kim;J.H. Kim;S.C. Oh;J.S. Ham;J.Y. Moon
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.6
    • /
    • pp.12-21
    • /
    • 2023
  • Visual relationship understanding in computer vision allows to recognize meaningful relationships between objects in a scene. This technology enables the extraction of representative information within visual content. We discuss the technology of visual relationship understanding, specifically focusing on videos. We first introduce visual relationship understanding concepts in videos and then explore the latest existing techniques. Next, we present benchmark datasets commonly used in video visual relationship understanding. Finally, we discuss future research directions in video visual relationship understanding.

Investigating the Effects of Training Image Dataset's Size and Specificity on Visual Scene Understanding AI in Construction (건설현장 컴퓨터비전 AI 성능에 대한 학습 이미지 데이터셋 크기 및 특화성의 영향 분석)

  • Jinwoo Kim;Seokho Chi
    • Land and Housing Review
    • /
    • v.15 no.4
    • /
    • pp.1-9
    • /
    • 2024
  • Visual scene understanding AI, a pivotal factor for digital transformation and robotic automation in construction, has primarily been researched under the hypothesis that the more training images, the higher the model performance. Alternatively, one can hypothesize that prioritizing activity-specific training images tailored to each construction phase would be more critical than merely enlarging the size of the dataset. This approach is particularly vital in dynamic construction environments where visual characteristics undergo significant changes across the construction phases, from earthmoving, foundation, and superstructure to finishing activities. In this background, we investigate the effects of a training image dataset's size and specificity on visual scene understanding AI in construction. We build an all-in-one, universal training image dataset as well as an activity-specific dataset, varying the number of training images. We then train vision-based worker detection models using each dataset and assess their performance in activity-specific, dynamic test environments. We analyze the optimal performance achieved in each test environment and how the model's performance varies depending on the dataset's size over the entire test phase. Our findings will help scientifically validate the dual hypotheses and lay a solid foundation for building and updating a training image dataset when developing a visual scene understanding AI model in dynamic construction sites.

Improving visual relationship detection using linguistic and spatial cues

  • Jung, Jaewon;Park, Jongyoul
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.399-410
    • /
    • 2020
  • Detecting visual relationships in an image is important in an image understanding task. It enables higher image understanding tasks, that is, predicting the next scene and understanding what occurs in an image. A visual relationship comprises of a subject, a predicate, and an object, and is related to visual, language, and spatial cues. The predicate explains the relationship between the subject and object and can be categorized into different categories such as prepositions and verbs. A large visual gap exists although the visual relationship is included in the same predicate. This study improves upon a previous study (that uses language cues using two losses) and a spatial cue (that only includes individual information) by adding relative information on the subject and object of the extant study. The architectural limitation is demonstrated and is overcome to detect all zero-shot visual relationships. A new problem is discovered, and an explanation of how it decreases performance is provided. The experiment is conducted on the VRD and VG datasets and a significant improvement over previous results is obtained.

An Analysis on the Range of Singular Fusion of Augmented Reality Devices

  • Lee, Hanul;Park, Minyoung;Lee, Hyeontaek;Choi, Hee-Jin
    • Current Optics and Photonics
    • /
    • v.4 no.6
    • /
    • pp.540-544
    • /
    • 2020
  • Current two-dimensional (2D) augmented reality (AR) devices present virtual image and information to a fixed focal plane, regardless of the various locations of ambient objects of interest around the observer. This limitation can lead to a visual discomfort caused by misalignments between the view of the ambient object of interest and the visual representation on the AR device due to a failing of the singular fusion. Since the misalignment becomes more severe as the depth difference gets greater, it can hamper visual understanding of the scene, interfering with task performance of the viewer. Thus, we analyzed the range of singular fusion (RSF) of AR images within which viewers can perceive the shape of an object presented on two different depth planes without difficulty due to the failure of singular fusion. It is expected that our analysis can inspire the development of advanced AR systems with low visual discomfort.

A Study on the Correlation of the Theory of Montage in Film Arts with Animation (영상예술 몽타주이론과 애니메이션의 상관관계 연구)

  • Lee, Lee-Nam
    • Cartoon and Animation Studies
    • /
    • s.9
    • /
    • pp.199-219
    • /
    • 2005
  • This paper is studying about how things are showed the montage theory and mis-en-scene effect in the screen media, what the concrete project's cases and how those theories have been supported to the animation's effects and its development. Besides, 1 tried to describe the shown things in the animation area, what the montage theory and mis-en-scene effect had been imported and expressed based on the screen studies of representative genre in the visual media. The purpose of this thesis suggests to help the creative animation scenes by liberal understanding and acceptance about the montage theory and mis-en-scene effect for the future animation's progressive aspect. With the improvement, there this thesis's suggestions could help the creative and special effects of animation asa part of Screen Arts, and would be the part of the progressive factors in the animation area.

  • PDF

Researching Visual Immersion Elements in VR Game <Half-Life: Alyx>

  • Chenghao Wang;Jeanhun Chung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.2
    • /
    • pp.181-186
    • /
    • 2023
  • With the development of VR technology, the visual immersion of VR games has been greatly enhanced nowadays. There has been an issue that has been troubling players in previous VR games, which is motion sickness. Therefore, VR games have been limited in terms of game mechanics, game duration, and game scale, greatly reducing the immersive experience of visual immersion. However, <Half-Life: Alyx> is different from previous VR games in that players can actually perform spatial displacement in the game scene, rather than being fixed in one place for 360-degree observation and interaction. At the same time, compared to traditional games, VR games no longer need to rely on screens, and the complete visual immersion enhances the fun and playability of the game. This research focuses on the VR game <Half-Life: Alyx> to explore its immersive factors in terms of visual perception. Through in-depth analysis of elements such as color, texture mapping, lighting, etc. in VR games, it was found that the game creates a strong sense of visual immersion in these aspects. Through analysis, it is helpful to gain a deeper understanding of the factors that contribute to visual immersion in VR games, which has certain reference value for game developers and related professionals.

Stereo Correspondence Using Graphs Cuts Kernel (그래프 컷 커널을 이용한 스테레오 대응)

  • Lee, Yong-Hwan;Kim, Youngseop
    • Journal of the Semiconductor & Display Technology
    • /
    • v.16 no.2
    • /
    • pp.70-74
    • /
    • 2017
  • Given two stereo images of a scene, it is possible to recover a 3D understanding of the scene. This is the primary way that the human visual system estimates depth. This process is useful in applications like robotics, where depth sensors may be expensive but a pair of cameras is relatively cheap. In this work, we combined our interests to implement a graph cut algorithm for stereo correspondence, and performed evaluation against a baseline algorithm using normalized cross correlation across a variety of metrics. Experimental trials revealed that the proposed descriptor exhibited a significant improvement, compared to the other existing methods.

  • PDF

Characterization of Rabbit Retinal Ganglion Cells with Multichannel Recording (다채널기록법을 이용한 토끼 망막 신경절세포의 특성 분석)

  • Cho Hyun Sook;Jin Gye-Hwan;Goo Yong Sook
    • Progress in Medical Physics
    • /
    • v.15 no.4
    • /
    • pp.228-236
    • /
    • 2004
  • Retinal ganglion cells transmit visual scene as an action potential to visual cortex through optic nerve. Conventional recording method using single intra- or extra-cellular electrode enables us to understand the response of specific neuron on specific time. Therefore, it is not possible to determine how the nerve impulses in the population of retinal ganglion cells collectively encode the visual stimulus with conventional recording. This requires recording the simultaneous electrical signals of many neurons. Recent advances in multi-electrode recording have brought us closer to understanding how visual information is encoded by population of retinal ganglion cells. We examined how ganglion cells act together to encode a visual scene with multi-electrode array (MEA). With light stimulation (on duration: 2 sec, off duration: 5 sec) generated on a color monitor driven by custom-made software, we isolated three functional types of ganglion cell activities; ON (35.0$\pm$4.4%), OFF (31.4$\pm$1.9%), and ON/OFF cells (34.6$\pm$5.3%) (Total number of retinal pieces = 8). We observed that nearby neurons often fire action potential near synchrony (< 1 ms). And this narrow correlation is seen among cells within a cluster which is made of 6~8 cells. As there are many more synchronized firing patterns than ganglion cells, such a distributed code might allow the retina to compress a large number of distinct visual messages into a small number of ganglion cells.

  • PDF

A Model on the Determinants of Visual Preference at Golf courses (경관의 선호도 결정인자 모형 -골프장을 배경으로-)

  • 서주환;이철민;맹상빈
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.29 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • The purpose of this thesis is to classify landscape-type of golf course, and t provide a better understanding of landscape of existing golf courses, seek a developed method for landscape, and other useful knowledge. In order to classify landscape type of the golf course, and analyze the preference for the determinants, we have selected 4 golf courses in Yongin, Kyonggi Province. The analysis in this study shows that a variable, 'familiarity' is the most potent influence of visual preference, (Sig 0.01), and it can be divided into five classes of landscape-type in golf course. More specifically, we conducted the analysis of the image of views and visual preference to bring out major factors which could decide visual preference in golf courses. The results between visual preference and physical variable are as follows; 1. It implies that a factor for a image of the view in golf course is analyzed by 4 districts from factor 1 to factor 4. An ability explaining those factors in the whole of variable quantity is 51.742%, implying factors for the image of the scene in golf courses are appeared as familiarity, changeableness, spaciousness, and naturalness. Among those factors, since familiarity(C.V.; 26.783%) and changeableness(C.V.; 112.200%) took high rank, this represents the fact that familiarity or changeableness highly affects the forming of image. 2. Defending on degree of image ability in golf course, we could classify as five types such as Type I, Type II, Type II, Type IV, and Type V. 3. As a result of calculating type of factor score, Type I had the lowest ranking in naturalness, and rather lower than others such as organization and spaciousness. The II is a top-ranked one in familiarity and naturalness, while it has the lowest ranking in spaciousness. Type III has the highest ranking in organization and preference. Type IV is the lowest-ranked one in familiarity and preference. Type IV is the lowest-ranked one is familiarity and preference. Type V has the highest ranking in spaciousness, but the lowest ranking in organization. 4. As a result of preference, Type III, Type II, Type V, Type I, and Type IV come out in order. That water-seen place type charges the highest rank shows the importance of changeable materials. 5. These factors-familiarity, organization, spaciousness and naturalness- are the major materials of the scene of view in golf courses. The possibility of how to use those for designing and making enhanced golf courses should be reinvestigated through these factors. Especially, it is acknowledged that the duction of changeableness, which is not mentioned in the study of informational approach, is much stimulating for the designing use. A further research on this theme should be made in the future, not limiting to the golf courses in Yongin.

  • PDF

Extended Support Vector Machines for Object Detection and Localization

  • Feyereisl, Jan;Han, Bo-Hyung
    • The Magazine of the IEIE
    • /
    • v.39 no.2
    • /
    • pp.45-54
    • /
    • 2012
  • Object detection is a fundamental task for many high-level computer vision applications such as image retrieval, scene understanding, activity recognition, visual surveillance and many others. Although object detection is one of the most popular problems in computer vision and various algorithms have been proposed thus far, it is also notoriously difficult, mainly due to lack of proper models for object representation, that handle large variations of object structure and appearance. In this article, we review a branch of object detection algorithms based on Support Vector Machines (SVMs), a well-known max-margin technique to minimize classification error. We introduce a few variations of SVMs-Structural SVMs and Latent SVMs-and discuss their applications to object detection and localization.

  • PDF