통합 검색 | Korea Science

Multimodal Context Embedding for Scene Graph Generation

Jung, Gayoung;Kim, Incheol
- Journal of Information Processing Systems
- /
- 제16권6호
- /
- pp.1250-1260
- /
- 2020
This study proposes a novel deep neural network model that can accurately detect objects and their relationships in an image and represent them as a scene graph. The proposed model utilizes several multimodal features, including linguistic features and visual context features, to accurately detect objects and relationships. In addition, in the proposed model, context features are embedded using graph neural networks to depict the dependencies between two related objects in the context feature vector. This study demonstrates the effectiveness of the proposed model through comparative experiments using the Visual Genome benchmark dataset.
https://doi.org/10.3745/JIPS.02.0147 인용 PDF KSCI

실제 이미지에서 현저성과 맥락 정보의 영향을 고려한 시각 탐색 모델 (Visual Search Model based on Saliency and Scene-Context in Real-World Images)

최윤형;오형석;명노해
- 대한산업공학회지
- /
- 제41권4호
- /
- pp.389-395
- /
- 2015
According to much research on cognitive science, the impact of the scene-context on human visual search in real-world images could be as important as the saliency. Therefore, this study proposed a method of Adaptive Control of Thought-Rational (ACT-R) modeling of visual search in real-world images, based on saliency and scene-context. The modeling method was developed by using the utility system of ACT-R to describe influences of saliency and scene-context in real-world images. Then, the validation of the model was performed, by comparing the data of the model and eye-tracking data from experiments in simple task in which subjects search some targets in indoor bedroom images. Results show that model data was quite well fit with eye-tracking data. In conclusion, the method of modeling human visual search proposed in this study should be used, in order to provide an accurate model of human performance in visual search tasks in real-world images.
https://doi.org/10.7232/JKIIE.2015.41.4.389 인용 PDF KSCI

Construction Site Scene Understanding: A 2D Image Segmentation and Classification

Kim, Hongjo;Park, Sungjae;Ha, Sooji;Kim, Hyoungkwan
- 국제학술발표논문집
- /
- The 6th International Conference on Construction Engineering and Project Management
- /
- pp.333-335
- /
- 2015
A computer vision-based scene recognition algorithm is proposed for monitoring construction sites. The system analyzes images acquired from a surveillance camera to separate regions and classify them as building, ground, and hole. Mean shift image segmentation algorithm is tested for separating meaningful regions of construction site images. The system would benefit current monitoring practices in that information extracted from images could embrace an environmental context.
PDF

Multimodal Attention-Based Fusion Model for Context-Aware Emotion Recognition

Vo, Minh-Cong;Lee, Guee-Sang
- International Journal of Contents
- /
- 제18권3호
- /
- pp.11-20
- /
- 2022
Human Emotion Recognition is an exciting topic that has been attracting many researchers for a lengthy time. In recent years, there has been an increasing interest in exploiting contextual information on emotion recognition. Some previous explorations in psychology show that emotional perception is impacted by facial expressions, as well as contextual information from the scene, such as human activities, interactions, and body poses. Those explorations initialize a trend in computer vision in exploring the critical role of contexts, by considering them as modalities to infer predicted emotion along with facial expressions. However, the contextual information has not been fully exploited. The scene emotion created by the surrounding environment, can shape how people perceive emotion. Besides, additive fusion in multimodal training fashion is not practical, because the contributions of each modality are not equal to the final prediction. The purpose of this paper was to contribute to this growing area of research, by exploring the effectiveness of the emotional scene gist in the input image, to infer the emotional state of the primary target. The emotional scene gist includes emotion, emotional feelings, and actions or events that directly trigger emotional reactions in the input image. We also present an attention-based fusion network, to combine multimodal features based on their impacts on the target emotional state. We demonstrate the effectiveness of the method, through a significant improvement on the EMOTIC dataset.
https://doi.org/10.5392/IJoC.2022.18.3.011 인용 PDF KSCI HTML

PC networked parallel processing system for figures and letters

Kitazawa, M.;Sakai, Y.
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 1993년도 한국자동제어학술회의논문집(국제학술편); Seoul National University, Seoul; 20-22 Oct. 1993
- /
- pp.277-282
- /
- 1993
In understanding concepts, there are two aspects; image and language. The point discussed in this paper is things fundamental in finding proper relations between objects in a scene to represent the meaning of the that whole scene properly through experiencing in image and language. It is assumed that one of the objects in a scene has letters as objects inside its contour. As the present system can deal with both figures and letters in a scene, the above assumption makes it easy for the system to infer the context of a scene. Several personal computers on the LAN network are used and they process items in parallel.
PDF

화자인식을 이용한 대화 상황정보 어노테이션 (Conversation Context Annotation using Speaker Detection)

박승보;김유원;조근식
- 한국멀티미디어학회논문지
- /
- 제12권9호
- /
- pp.1252-1261
- /
- 2009
효율적인 영상의 검색과 동영상의 축약을 위해 선행되어야 하는 것이 동영상 정보에서 의미를 추출하여 영상 정보를 어노테이션 하는 작업이다. 어노테이션을 위한 동영상의 의미 정보는 다양한 방식에 의해 얻어질 수 있다. 동영상의 의미정보는 영상의 개체들의 단순한 정체 정보를 추출하는 방식과 개체들이 만들어 내는 상황정보를 추출하는 방식으로 구분될 수 있다. 하지만 개체들의 단순 정보만으로 어노테이션을 진행하기 보다는 개체들 간의 상호작용이나 관계에 대한 표현을 개체 정보와 함께 고려하여 대화 상황에 대한 온전한 의미를 어노테이션 하는 것이 더욱 좋다. 본 논문은 영상으로부터 화자정보를 추출하고 대화상황을 구성하여 어노테이션 하는 것에 대한 연구이다. 인식된 얼굴 정보로부터 현재 영상에 누가 있는 지 알아낸 후 입의 움직임을 분석하여 화자가 누구인지 알아내고, 화자와 청자 및 자막의 유무를 통해 대화 상황을 추출하여 XML로 변환하는 방법을 본 연구에서 제안한다.
PDF

물체-배경 맥락 부합성이 물체에 대한 주의 할당과 기억에 미치는 영향 (Effects of Object-Background Contextual Consistency on the Allocation of Attention and Memory of the Object)

이윤경;김비아
- 인지과학
- /
- 제24권2호
- /
- pp.133-171
- /
- 2013
본 연구는 사람들이 장면을 지각하는 동안 장면 맥락에 부합하지 않는 물체에 더 많은 주의를 할당하고, 그 물체에 대한 정확 회상률도 높을 것이라는 가설을 검증하고자 하였다. 이를 검증하기 위하여, 본 연구에서는 두 개의 실험을 수행하였다. 두 실험 모두 장면 제시 시간(2초, 5초, 10초)과 맥락 부합성(부합, 비부합)을 조작한 $3{\times}2$ 요인설계를 사용하였다. 종속 변인은 장면을 지각하는 동안의 안구 운동 패턴과 장면을 모두 학습한 뒤 수행한 기억 검사에서의 정확 회상률이었다. 실험 1에서는 선행 연구의 제한점을 보완하여 물체와 배경의 맥락 부합성에 따른 주의 할당을 재검증하고, 실험 2에서는 장면을 지각하는 동안 참가자들의 주의를 분산시키는 주의 분산 과제를 사용하였을 때에도 여전히 맥락에 부합하지 않는 물체에 더 많은 주의를 할당하는지 검증하였다. 실험 1의 연구 결과, 참가자들은 짧은 시간 내에 장면 맥락에 부합하지 않는 물체를 빠르게 응시하였고, 장면을 지각하는 동안 맥락 비부합 물체를 상대적으로 더 많이, 자주, 그리고 오랫동안 응시하였으며 그 물체에 대한 위치 기억이 우수하였다. 주의 분산 과제를 수행한 실험 2에서도 실험 1과 유사한 패턴의 결과를 관찰할 수 있었다. 주의 분산 과제를 통해 주의를 의도적으로 분산시켰을 때에도, 맥락에 부합하지 않는 물체에 더 많은 주의가 할당된 본 연구의 결과는 맥락 부합성이 장면 지각에서의 주의 할당에 강력한 영향을 미친다는 사실을 시사한다.
PDF

비디오에서 양방향 문맥 정보를 이용한 상호 협력적인 위치 및 물체 인식 (Collaborative Place and Object Recognition in Video using Bidirectional Context Information)

김성호;권인소
- 로봇학회논문지
- /
- 제1권2호
- /
- pp.172-179
- /
- 2006
In this paper, we present a practical place and object recognition method for guiding visitors in building environments. Recognizing places or objects in real world can be a difficult problem due to motion blur and camera noise. In this work, we present a modeling method based on the bidirectional interaction between places and objects for simultaneous reinforcement for the robust recognition. The unification of visual context including scene context, object context, and temporal context is also. The proposed system has been tested to guide visitors in a large scale building environment (10 topological places, 80 3D objects).
PDF

3D애니메이션의 감성적 라이팅 스타일 연구 (A Study on 3D Animation Emotional Lighting Style)

조정성
- 한국콘텐츠학회:학술대회논문집
- /
- 한국콘텐츠학회 2005년도 추계 종합학술대회 논문집
- /
- pp.153-160
- /
- 2005
3D 애니메이션의 화면에서 전해지는 분위기는 대부분 3D CG 라이팅의 설정에 따라 좌우 된다고 해도 과언이 아니다. 컴퓨터 그래픽의 맥락에서 라이팅은 예술적이고 기술적인 방법으로 디지털 씬(Scene)들을 비추는(밝히는) 과정이다. 그래서 관객은 화면에서 적절한 명쾌함과 분위기로 나타내고자 하는 감독의 의도가 무엇인지를 인지 할 수 있는 것이다. Lighting은 인간에 의해 창조 및 조작되는 빛과 색채의 미학으로서 장면들을 아름답고 조화롭게 만드는 역할을 한다. 또한 전달하고자하는 이야기와 표현하고자 하는 분위기를 상징적이고 은유적 기법으로 스타일화 한다. 그러므로 라이팅 스타일의 컨셉은 애니메이션의 특정한 상황이나 환경 그리고 아트 디렉션 밀접하게 연관되어진다. 그러나 불행히도 장면을 라이팅하는 작업공정에는 쉽게 할 수 있는 정해진 규칙이나 공식은 없다. 요컨대, 라이팅은 위치, 컬러, 농도, 그림자 영역과 범위를 포함하는 라이팅 셋업의 조건적 요소들로 애니메이션에서 보여 주고자하는 장면의 스타일을 결정짓는 데에 기여하지만, 그와 동시에 서정드라마냐 서스펜스냐 하이 드라마냐 와 같은 애니메이션의 장르와 장면의 스타일과 같은 전체적인 무드를 제시하는 예술적 측면을 간과해서는 않될 것이다.
PDF

지역 컨텍스트 및 전역 컨텍스트 정보를 이용한 비디오 장면 경계 검출 (Detection of Video Scene Boundaries based on the Local and Global Context Information)

강행봉
- 한국정보과학회논문지:컴퓨팅의 실제 및 레터
- /
- 제8권6호
- /
- pp.778-786
- /
- 2002
장면 경계 검출은 비디오 데이타에서 의미적인 구조를 이해하는데 있어서 매우 중요한 역할을 한다. 하지만, 장면 경계 검출은 의미적인 일관성을 갖는 장면을 추출하여야 하므로 셧 경계 검출에 비해 매우 까다로운 작업이다. 본 논문에서는 비디오 데이타에 존재하는 의미적인 정보를 사용하기 위해 비디오 셧의 지역 및 전역 컨텍스트 정보를 추출하여 이를 바탕으로 장면 경계를 검출하는 방식을 제안한다. 비디오 셧의 지역 컨텍스트 정보는 셧 자체에 존재하는 컨텍스트 정보로서 전경 객체(foreground object), 배경(background) 및 움직임 정보들로 정의한다. 전역 컨텍스트 정보는 주어진 비디오 셧이 주위에 존재하는 다른 비디오 셧들과의 관계로부터 발생하는 다양한 컨텍스트로서 셧들간의 유사성, 상호 작용 및 셧들의 지속 시간 패턴으로 정의한다. 이런 컨텍스트 정보를 바탕으로 연결 작업, 연결 검증 작업 및 조정 작업등의 3단계 과정을 거쳐 장면을 검출한다. 제안된 방식을 TV 드라마 및 영화에 적용하여 80% 이상의 검출 정확도를 얻었다.
PDF KSCI

검색결과 73건 처리시간 0.027초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)