• Title/Summary/Keyword: 3D Visual Attention Model

Search Result 16, Processing Time 0.023 seconds

Modeling of Visual Attention Probability for Stereoscopic Videos and 3D Effect Estimation Based on Visual Attention (3차원 동영상의 시각 주의 확률 모델 도출 및 시각 주의 기반 입체감 추정)

  • Kim, Boeun;Song, Wonseok;Kim, Taejeong
    • Journal of KIISE
    • /
    • v.42 no.5
    • /
    • pp.609-620
    • /
    • 2015
  • Viewers of videos are likely to absorb more information from the part of the screen that attracts visual attention. This fact has led to the visual attention models that are being used in producing and evaluating videos. In this paper, we investigate the factors that are significant to visual attention and the mathematical form of the visual attention model. We then estimated the visual attention probability using the statistical design of experiments. The analysis of variance (ANOVA) verifies that the motion velocity, distance from the screen, and amount of defocus blur affect human visual attention significantly. Using the response surface modeling (RSM), we created a visual attention score model that concerns the three factors, from which we calculate the visual attention probabilities (VAPs) of image pixels. The VAPs are directly applied to existing gradient based 3D effect perception measurement. By giving weights according to our VAPs, our algorithm achieves more accurate measurement than the existing method. The performance of the proposed measurement is assessed by comparing them with subjective evaluation as well as with existing methods. The comparison verifies that the proposed measurement outperforms the existing ones.

Stereo Image Quality Assessment Using Visual Attention and Distortion Predictors

  • Hwang, Jae-Jeong;Wu, Hong Ren
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.9
    • /
    • pp.1613-1631
    • /
    • 2011
  • Several metrics have been reported in the literature to assess stereo image quality, mostly based on visual attention or human visual sensitivity based distortion prediction with the help of disparity information, which do not consider the combined aspects of human visual processing. In this paper, visual attention and depth assisted stereo image quality assessment model (VAD-SIQAM) is devised that consists of three main components, i.e., stereo attention predictor (SAP), depth variation (DV), and stereo distortion predictor (SDP). Visual attention is modeled based on entropy and inverse contrast to detect regions or objects of interest/attention. Depth variation is fused into the attention probability to account for the amount of changed depth in distorted stereo images. Finally, the stereo distortion predictor is designed by integrating distortion probability, which is based on low-level human visual system (HVS), responses into actual attention probabilities. The results show that regions of attention are detected among the visually significant distortions in the stereo image pair. Drawbacks of human visual sensitivity based picture quality metrics are alleviated by integrating visual attention and depth information. We also show that positive correlation with ground-truth attention and depth maps are increased by up to 0.949 and 0.936 in terms of the Pearson and the Spearman correlation coefficients, respectively.

Effective PPL Arrangements in the Screen of Multimedia Contents (멀티미디어 콘텐츠화면에서의 효과적인 PPL 배치)

  • Lee, Young-Jae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.5
    • /
    • pp.875-881
    • /
    • 2007
  • This study explores the attention effects of PPL(product placement) in multimedia contents. PPL has been attracted attention in multimedia as well as marketing communication field as a beneficiary model. For the research, multimedia screen is divided into 9 sections and the serial 9 digit($1{\sim}9$) is assigned to the each part of the screen. The visual exposure forms of each 9 digit are composed by 2-dimension(2D) and 3-dimension(3D). And the visual exposure patterns of each 9 digit are consisted of stopping and moving image. As a result, the 5th quartering has been proved the most attracted attention regardless of all exposure forms including 2D/3D and slopping/moving image. This means center of the multimedia screen is the best place for PPL. Especially in one digit moving screen the attention of the digit has reached the climax. This suggests moving PPL is able to get more attention than stopping. These results provide the most effective PPL position in the screen of the multimedia and PPL's visual exposure forms for maximizing multimedia user's attention. Finally, these findings can be a guideline fer message arrangements of the multimedia screen.

Development and Evaluation of D-Attention Unet Model Using 3D and Continuous Visual Context for Needle Detection in Continuous Ultrasound Images (연속 초음파영상에서의 바늘 검출을 위한 3D와 연속 영상문맥을 활용한 D-Attention Unet 모델 개발 및 평가)

  • Lee, So Hee;Kim, Jong Un;Lee, Su Yeol;Ryu, Jeong Won;Choi, Dong Hyuk;Tae, Ki Sik
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.5
    • /
    • pp.195-202
    • /
    • 2020
  • Needle detection in ultrasound images is sometimes difficult due to obstruction of fat tissues. Accurate needle detection using continuous ultrasound (CUS) images is a vital stage of treatment planning for tissue biopsy and brachytherapy. The main goal of the study is classified into two categories. First, new detection model, i.e. D-Attention Unet, is developed by combining the context information of 3D medical data and CUS images. Second, the D-Attention Unet model was compared with other models to verify its usefulness for needle detection in continuous ultrasound images. The continuous needle images taken with ultrasonic waves were converted into still images for dataset to evaluate the performance of the D-Attention Unet. The dataset was used for training and testing. Based on the results, the proposed D-Attention Unet model showed the better performance than other 3 models (Unet, D-Unet and Attention Unet), with Dice Similarity Coefficient (DSC), Recall and Precision at 71.9%, 70.6% and 73.7%, respectively. In conclusion, the D-Attention Unet model provides accurate needle detection for US-guided biopsy or brachytherapy, facilitating the clinical workflow. Especially, this kind of research is enthusiastically being performed on how to add image processing techniques to learning techniques. Thus, the proposed method is applied in this manner, it will be more effective technique than before.

3D Visual Attention Model and its Application to No-reference Stereoscopic Video Quality Assessment (3차원 시각 주의 모델과 이를 이용한 무참조 스테레오스코픽 비디오 화질 측정 방법)

  • Kim, Donghyun;Sohn, Kwanghoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.4
    • /
    • pp.110-122
    • /
    • 2014
  • As multimedia technologies develop, three-dimensional (3D) technologies are attracting increasing attention from researchers. In particular, video quality assessment (VQA) has become a critical issue in stereoscopic image/video processing applications. Furthermore, a human visual system (HVS) could play an important role in the measurement of stereoscopic video quality, yet existing VQA methods have done little to develop a HVS for stereoscopic video. We seek to amend this by proposing a 3D visual attention (3DVA) model which simulates the HVS for stereoscopic video by combining multiple perceptual stimuli such as depth, motion, color, intensity, and orientation contrast. We utilize this 3DVA model for pooling on significant regions of very poor video quality, and we propose no-reference (NR) stereoscopic VQA (SVQA) method. We validated the proposed SVQA method using subjective test scores from our results and those reported by others. Our approach yields high correlation with the measured mean opinion score (MOS) as well as consistent performance in asymmetric coding conditions. Additionally, the 3DVA model is used to extract information for the region-of-interest (ROI). Subjective evaluations of the extracted ROI indicate that the 3DVA-based ROI extraction outperforms the other compared extraction methods using spatial or/and temporal terms.

Volume Haptic Rendering Algorithm for Realistic Modeling (실감형 모델링을 위한 볼륨 햅틱 렌더링 알고리즘)

  • Jung, Ji-Chan;Park, Joon-Young
    • Korean Journal of Computational Design and Engineering
    • /
    • v.15 no.2
    • /
    • pp.136-143
    • /
    • 2010
  • Realistic Modeling is to maximize the reality of the environment in which perception is made by virtual environment or remote control using two or more senses of human. Especially, the field of haptic rendering, which provides reality through interaction of visual and tactual sense in realistic model, has brought attention. Haptic rendering calculates the force caused by model deformation during interaction with a virtual model and returns it to the user. Deformable model in the haptic rendering has more complexity than a rigid body because the deformation is calculated inside as well as the outside the model. For this model, Gibson suggested the 3D ChainMail algorithm using volumetric data. However, in case of the deformable model with non-homogeneous materials, there were some discordances between visual and tactual sense information when calculating the force-feedback in real time. Therefore, we propose an algorithm for the Volume Haptic Rendering of non-homogeneous deformable object that reflects the force-feedback consistently in real time, depending on visual information (the amount of deformation), without any post-processing.

Video Captioning with Visual and Semantic Features

  • Lee, Sujin;Kim, Incheol
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1318-1330
    • /
    • 2018
  • Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

Finding Optimal Paths in Indoor Spaces using 3D GIS (3D-GIS를 이용한 건물 내부공간의 최적경로탐색)

  • Ryu Keun-Won;Jun Chul-Min;Jo Sung-Kil;Lee Sang-Mi
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2006.04a
    • /
    • pp.387-392
    • /
    • 2006
  • 3D-based information is needed increasingly as well as 2D Information as cities grow and buildings become large and complex, and use of 3D-models is getting attention to handle such problems. However, there are limitations in using 3D-models because most applications and research efforts using them have been for visual analysis. This study presents a method to find optimal paths in indoor spaces as an illustration for using 3D-models in spatial analysis. We modeled rooms, paths and other facilities in a building as individual 3D objects. We made it possible to find paths based on network structure by integrating the vector-based networks of 2D-GIS and 3D-model.

  • PDF

Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images (멀티-뷰 영상들을 활용하는 3차원 의미적 분할을 위한 효과적인 멀티-모달 특징 융합)

  • Hye-Lim Bae;Incheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.505-518
    • /
    • 2023
  • 3D point cloud semantic segmentation is a computer vision task that involves dividing the point cloud into different objects and regions by predicting the class label of each point. Existing 3D semantic segmentation models have some limitations in performing sufficient fusion of multi-modal features while ensuring both characteristics of 2D visual features extracted from RGB images and 3D geometric features extracted from point cloud. Therefore, in this paper, we propose MMCA-Net, a novel 3D semantic segmentation model using 2D-3D multi-modal features. The proposed model effectively fuses two heterogeneous 2D visual features and 3D geometric features by using an intermediate fusion strategy and a multi-modal cross attention-based fusion operation. Also, the proposed model extracts context-rich 3D geometric features from input point cloud consisting of irregularly distributed points by adopting PTv2 as 3D geometric encoder. In this paper, we conducted both quantitative and qualitative experiments with the benchmark dataset, ScanNetv2 in order to analyze the performance of the proposed model. In terms of the metric mIoU, the proposed model showed a 9.2% performance improvement over the PTv2 model using only 3D geometric features, and a 12.12% performance improvement over the MVPNet model using 2D-3D multi-modal features. As a result, we proved the effectiveness and usefulness of the proposed model.

LVLN : A Landmark-Based Deep Neural Network Model for Vision-and-Language Navigation (LVLN: 시각-언어 이동을 위한 랜드마크 기반의 심층 신경망 모델)

  • Hwang, Jisu;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.9
    • /
    • pp.379-390
    • /
    • 2019
  • In this paper, we propose a novel deep neural network model for Vision-and-Language Navigation (VLN) named LVLN (Landmark-based VLN). In addition to both visual features extracted from input images and linguistic features extracted from the natural language instructions, this model makes use of information about places and landmark objects detected from images. The model also applies a context-based attention mechanism in order to associate each entity mentioned in the instruction, the corresponding region of interest (ROI) in the image, and the corresponding place and landmark object detected from the image with each other. Moreover, in order to improve the success rate of arriving the target goal, the model adopts a progress monitor module for checking substantial approach to the target goal. Conducting experiments with the Matterport3D simulator and the Room-to-Room (R2R) benchmark dataset, we demonstrate high performance of the proposed model.