• Title/Summary/Keyword: Visual Attention Software

Search Result 32, Processing Time 0.026 seconds

Study on Visual Recognition Enhancement of Yellow Carpet Placed at Near Pedestrian Crossing Areas : Visual Attention Software Implementation (횡단보도 옐로카펫 설치에 따른 시인성 증진효과 연구 : Visual Attention Software 분석 중심으로)

  • Ahn, Hyo-Sub;Kim, Jin-Tae
    • Journal of Information Technology Services
    • /
    • v.15 no.4
    • /
    • pp.73-83
    • /
    • 2016
  • Pedestrian safety was recently highlighted with a yellow carpet, a yellow-colored pavement material prepared for children waiting for signals for pedestrian crossing, without validation of its efficiency in practice. It was a promising device likely to assist highway safety by stimulating pedestrian to step on the yellow-colored area; it was generally called nudge effects. This paper delivers a study conducted to check the effectiveness of yellow carpet in three different aspects in vehicle driver's perspective by applying the newly introduced information technology (IT) service: Visual Attention Software (VAS). It was assumed that VAS developed by 3M in the United States should be able explain the Korean drivers' visual reaction behaviors since technology embedded in VAS was developed based on and proved by other various international countries and continents in the world. A set of pictures was taken at thirteen different field sites in seven school zone areas in the Seoul metropolitan area before and after the installation of a yellow carpet, respectively. Sets of those pictures were analyzed with VAS, and the results were compared based on the selective safety measures: the likely focusing on standing pedestrians (waiting for a pedestrian's green signal time) affected by its background (yellow-colored pavement) contrasting him or her. The test results from a set of before-and-after comparison analyses showed that the placement of yellow carpet would (1) increase 71% of driver's visual attention on pedestrian crossing areas and (2) change the sequential order of visual attention on that area 2.4 steps ahead. The findings would enhance deployment of such promising efficiency and thus increase children safety in pedestrian crossing. The result was promising to highlight the way to support the changes in conservative traffic safety engineering field by applying the advanced IT services, while much robust research was recommended to overcome the limitation of simplification of this study.

Multi-level Cross-attention Siamese Network For Visual Object Tracking

  • Zhang, Jianwei;Wang, Jingchao;Zhang, Huanlong;Miao, Mengen;Cai, Zengyu;Chen, Fuguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3976-3990
    • /
    • 2022
  • Currently, cross-attention is widely used in Siamese trackers to replace traditional correlation operations for feature fusion between template and search region. The former can establish a similar relationship between the target and the search region better than the latter for robust visual object tracking. But existing trackers using cross-attention only focus on rich semantic information of high-level features, while ignoring the appearance information contained in low-level features, which makes trackers vulnerable to interference from similar objects. In this paper, we propose a Multi-level Cross-attention Siamese network(MCSiam) to aggregate the semantic information and appearance information at the same time. Specifically, a multi-level cross-attention module is designed to fuse the multi-layer features extracted from the backbone, which integrate different levels of the template and search region features, so that the rich appearance information and semantic information can be used to carry out the tracking task simultaneously. In addition, before cross-attention, a target-aware module is introduced to enhance the target feature and alleviate interference, which makes the multi-level cross-attention module more efficient to fuse the information of the target and the search region. We test the MCSiam on four tracking benchmarks and the result show that the proposed tracker achieves comparable performance to the state-of-the-art trackers.

MLSE-Net: Multi-level Semantic Enriched Network for Medical Image Segmentation

  • Di Gai;Heng Luo;Jing He;Pengxiang Su;Zheng Huang;Song Zhang;Zhijun Tu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2458-2482
    • /
    • 2023
  • Medical image segmentation techniques based on convolution neural networks indulge in feature extraction triggering redundancy of parameters and unsatisfactory target localization, which outcomes in less accurate segmentation results to assist doctors in diagnosis. In this paper, we propose a multi-level semantic-rich encoding-decoding network, which consists of a Pooling-Conv-Former (PCFormer) module and a Cbam-Dilated-Transformer (CDT) module. In the PCFormer module, it is used to tackle the issue of parameter explosion in the conservative transformer and to compensate for the feature loss in the down-sampling process. In the CDT module, the Cbam attention module is adopted to highlight the feature regions by blending the intersection of attention mechanisms implicitly, and the Dilated convolution-Concat (DCC) module is designed as a parallel concatenation of multiple atrous convolution blocks to display the expanded perceptual field explicitly. In addition, MultiHead Attention-DwConv-Transformer (MDTransformer) module is utilized to evidently distinguish the target region from the background region. Extensive experiments on medical image segmentation from Glas, SIIM-ACR, ISIC and LGG demonstrated that our proposed network outperforms existing advanced methods in terms of both objective evaluation and subjective visual performance.

2D-to-3D Conversion System using Depth Map Enhancement

  • Chen, Ju-Chin;Huang, Meng-yuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1159-1181
    • /
    • 2016
  • This study introduces an image-based 2D-to-3D conversion system that provides significant stereoscopic visual effects for humans. The linear and atmospheric perspective cues that compensate each other are employed to estimate depth information. Rather than retrieving a precise depth value for pixels from the depth cues, a direction angle of the image is estimated and then the depth gradient, in accordance with the direction angle, is integrated with superpixels to obtain the depth map. However, stereoscopic effects of synthesized views obtained from this depth map are limited and dissatisfy viewers. To obtain impressive visual effects, the viewer's main focus is considered, and thus salient object detection is performed to explore the significance region for visual attention. Then, the depth map is refined by locally modifying the depth values within the significance region. The refinement process not only maintains global depth consistency by correcting non-uniform depth values but also enhances the visual stereoscopic effect. Experimental results show that in subjective evaluation, the subjectively evaluated degree of satisfaction with the proposed method is approximately 7% greater than both existing commercial conversion software and state-of-the-art approach.

A Pilot MEG Study During A Visual Search Task (시각추적과제의 뇌자도 : 예비실험)

  • Kim, Sung Hun;Lee, Sang Kun;Kim, Kwang-Ki
    • Annals of Clinical Neurophysiology
    • /
    • v.8 no.1
    • /
    • pp.44-47
    • /
    • 2006
  • Background: The present study used magnetoencephalography (MEG) to investigate the neural substrates for modified version of Treisman's visual search task. Methods: Two volunteers who gave informed consent participated MEG experiment. One was 27- year old male and another was 24-year-old female. All were right handed. Experiment were performed using a 306-channel biomagnetometer (Neuromag LTD). There were three task conditions in this experiment. The first was searching an open circle among seven closed circles (open condition). The second was searching a closed circle among seven uni-directionally open circles (closed condition). And the third was searching a closed circle among seven eight-directionally open circles (random (closed) condition). In one run, participants performed one task condition so there were three runs in one session of experiment. During one session, 128 trials were performed during every three runs. One participant underwent one session of experiment. The participant pressed button when they found targets. Magnetic source localization images were generated using software programs that allowed for interactive identification of a common set of fiduciary points in the MRI and MEG coordinate frames. Results: In each participant we can found activations of anterior cingulate, primary visual and association cortices, posterior parietal cortex and brain areas in the vicinity of thalamus. Conclusions: we could find activations corresponding to anterior and posterior visual attention systems.

  • PDF

Detecting Salient Regions based on Bottom-up Human Visual Attention Characteristic (인간의 상향식 시각적 주의 특성에 바탕을 둔 현저한 영역 탐지)

  • 최경주;이일병
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.2
    • /
    • pp.189-202
    • /
    • 2004
  • In this paper, we propose a new salient region detection method in an image. The algorithm is based on the characteristics of human's bottom-up visual attention. Several features known to influence human visual attention like color, intensity and etc. are extracted from the each regions of an image. These features are then converted to importance values for each region using its local competition function and are combined to produce a saliency map, which represents the saliency at every location in the image by a scalar quantity, and guides the selection of attended locations, based on the spatial distribution of saliency region of the image in relation to its Perceptual importance. Results shown indicate that the calculated Saliency Maps correlate well with human perception of visually important regions.

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

  • Liu, Min;Tang, Jun
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.754-771
    • /
    • 2021
  • In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.

Kubernetes-based Framework for Improving Traffic Light Recognition Performance: Convergence Vision AI System based on YOLOv5 and C-RNN with Visual Attention (신호등 인식 성능 향상을 위한 쿠버네티스 기반의 프레임워크: YOLOv5와 Visual Attention을 적용한 C-RNN의 융합 Vision AI 시스템)

  • Cho, Hyoung-Seo;Lee, Min-Jung;Han, Yeon-Jee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.851-853
    • /
    • 2022
  • 고령화로 인해 65세 이상 운전자가 급증하며 고령운전자의 교통사고 비율이 증가함에 따라 시급한 사회 문제로 떠오르고 있다. 이에 본 연구에서는 객체 검출, 인식 모델을 결합하고 신호등을 인식하여 Text-To-Speech(TTS)로 알리는 쿠버네티스 기반의 프레임워크를 제안한다. 객체 검출 단계에서는 YOLOv5 모델들의 성능을 비교하여 활용하였으며 객체 인식 단계에서는 C-RNN 기반의 attention-OCR 모델을 활용하였다. 이는 신호등의 내부 LED 영역이 아닌 이미지 전체를 인식하는 방식으로 오탐지 요소를 낮춰 인식률을 높였다. 결과적으로 1,628장의 테스트 데이터에서 accuracy 0.997, F1-score 0.991의 성능 평가를 얻어 제안한 프레임워크의 타당성을 입증하였다. 본 연구는 후속 연구에서 특정 도메인에 딥러닝 모델을 한정하지 않고 다양한 분야의 모델을 접목할 수 있도록 하며 고령 운전자 및 신호 위반으로 인한 교통사고 문제를 예방할 수 있다.

Implementation of Preceding Vehicle Break-Lamp Detection System using Selective Attention Model and YOLO (선택적 주의집중 모델과 YOLO를 이용한 선행 차량 정지등 검출 시스템 구현)

  • Lee, Woo-Beom
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.22 no.2
    • /
    • pp.85-90
    • /
    • 2021
  • A ADAS(Advanced Driver Assistance System) for the safe driving is an important area in autonumous car. Specially, a ADAS software using an image sensors attached in previous car is low in building cost, and utilizes for various purpose. A algorithm for detecting the break-lamp from the tail-lamp of preceding vehicle is proposed in this paper. This method can perceive the driving condition of preceding vehicle. Proposed method uses the YOLO techinicque that has a excellent performance in object tracing from real scene, and extracts the intensity variable region of break-lamp from HSV image of detected vehicle ROI(Region Of Interest). After detecting the candidate region of break-lamp, each isolated region is labeled. The break-lamp region is detected finally by using the proposed selective-attention model that percieves the shape-similarity of labeled candidate region. In order to evaluate the performance of the preceding vehicle break-lamp detection system implemented in this paper, we applied our system to the various driving images. As a results, implemented system showed successful results.

Object Recognition and Pose Estimation Based on Deep Learning for Visual Servoing (비주얼 서보잉을 위한 딥러닝 기반 물체 인식 및 자세 추정)

  • Cho, Jaemin;Kang, Sang Seung;Kim, Kye Kyung
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.1
    • /
    • pp.1-7
    • /
    • 2019
  • Recently, smart factories have attracted much attention as a result of the 4th Industrial Revolution. Existing factory automation technologies are generally designed for simple repetition without using vision sensors. Even small object assemblies are still dependent on manual work. To satisfy the needs for replacing the existing system with new technology such as bin picking and visual servoing, precision and real-time application should be core. Therefore in our work we focused on the core elements by using deep learning algorithm to detect and classify the target object for real-time and analyzing the object features. We chose YOLO CNN which is capable of real-time working and combining the two tasks as mentioned above though there are lots of good deep learning algorithms such as Mask R-CNN and Fast R-CNN. Then through the line and inside features extracted from target object, we can obtain final outline and estimate object posture.