• Title/Summary/Keyword: Deep Features

Search Result 1,096, Processing Time 0.03 seconds

TSDnet: Three-scale Dense Network for Infrared and Visible Image Fusion (TSDnet: 적외선과 가시광선 이미지 융합을 위한 규모-3 밀도망)

  • Zhang, Yingmei;Lee, Hyo Jong
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.656-658
    • /
    • 2022
  • The purpose of infrared and visible image fusion is to integrate images of different modes with different details into a result image with rich information, which is convenient for high-level computer vision task. Considering many deep networks only work in a single scale, this paper proposes a novel image fusion based on three-scale dense network to preserve the content and key target features from the input images in the fused image. It comprises an encoder, a three-scale block, a fused strategy and a decoder, which can capture incredibly rich background details and prominent target details. The encoder is used to extract three-scale dense features from the source images for the initial image fusion. Then, a fusion strategy called l1-norm to fuse features of different scales. Finally, the fused image is reconstructed by decoding network. Compared with the existing methods, the proposed method can achieve state-of-the-art fusion performance in subjective observation.

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

  • Dingkang Hua;Qian Zhang;Wan Liao;Bin Wang;Tao Yan
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.483-497
    • /
    • 2023
  • Depth estimation is one of the most complicated and difficult problems to deal with in the light field. In this paper, a compound attention convolutional neural network (CAttNet) is proposed to extract depth maps from light field images. To make more effective use of the sub-aperture images (SAIs) of light field and reduce the redundancy in SAIs, we use a compound attention mechanism to weigh the channel and space of the feature map after extracting the primary features, so it can more efficiently select the required view and the important area within the view. We modified various layers of feature extraction to make it more efficient and useful to extract features without adding parameters. By exploring the characteristics of light field, we increased the network depth and optimized the network structure to reduce the adverse impact of this change. CAttNet can efficiently utilize different SAIs correlations and features to generate a high-quality light field depth map. The experimental results show that CAttNet has advantages in both accuracy and time.

Alzheimer progression classification using fMRI data (fMRI 데이터를 이용한 알츠하이머 진행상태 분류)

  • Ju Hyeon-Noh;Hee-Deok Yang
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.86-93
    • /
    • 2024
  • The development of functional magnetic resonance imaging (fMRI) has significantly contributed to mapping brain functions and understanding brain networks during rest. This paper proposes a CNN-LSTM-based classification model to classify the progression stages of Alzheimer's disease. Firstly, four preprocessing steps are performed to remove noise from the fMRI data before feature extraction. Secondly, the U-Net architecture is utilized to extract spatial features once preprocessing is completed. Thirdly, the extracted spatial features undergo LSTM processing to extract temporal features, ultimately leading to classification. Experiments were conducted by adjusting the temporal dimension of the data. Using 5-fold cross-validation, an average accuracy of 96.4% was achieved, indicating that the proposed method has high potential for identifying the progression of Alzheimer's disease by analyzing fMRI data.

Human Walking Detection and Background Noise Classification by Deep Neural Networks for Doppler Radars (사람 걸음 탐지 및 배경잡음 분류 처리를 위한 도플러 레이다용 딥뉴럴네트워크)

  • Kwon, Jihoon;Ha, Seoung-Jae;Kwak, Nojun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.7
    • /
    • pp.550-559
    • /
    • 2018
  • The effectiveness of deep neural networks (DNNs) for detection and classification of micro-Doppler signals generated by human walking and background noise sources is investigated. Previous research included a complex process for extracting meaningful features that directly affect classifier performance, and this feature extraction is based on experiences and statistical analysis. However, because a DNN gradually reconstructs and generates features through a process of passing layers in a network, the preprocess for feature extraction is not required. Therefore, binary classifiers and multiclass classifiers were designed and analyzed in which multilayer perceptrons (MLPs) and DNNs were applied, and the effectiveness of DNNs for recognizing micro-Doppler signals was demonstrated. Experimental results showed that, in the case of MLPs, the classification accuracies of the binary classifier and the multiclass classifier were 90.3% and 86.1%, respectively, for the test dataset. In the case of DNNs, the classification accuracies of the binary classifier and the multiclass classifier were 97.3% and 96.1%, respectively, for the test dataset.

A Deep Learning Based Recommender System Using Visual Information (시각 정보를 활용한 딥러닝 기반 추천 시스템)

  • Moon, Hyunsil;Lim, Jinhyuk;Kim, Doyeon;Cho, Yoonho
    • Knowledge Management Research
    • /
    • v.21 no.3
    • /
    • pp.27-44
    • /
    • 2020
  • In order to solve the user's information overload problem, recommender systems infer users' preferences and suggest items that match them. The collaborative filtering (CF), the most successful recommendation algorithm, has been improving performance until recently and applied to various business domains. Visual information, such as book covers, could influence consumers' purchase decision making. However, CF-based recommender systems have rarely considered for visual information. In this study, we propose VizNCS, a CF-based deep learning model that uses visual information as additional information. VizNCS consists of two phases. In the first phase, we build convolutional neural networks (CNN) to extract visual features from image data. In the second phase, we supply the visual features to the NCF model that is known to easy to extend to other information among the deep learning-based recommendation systems. As the results of the performance comparison experiments, VizNCS showed higher performance than the vanilla NCF. We also conducted an additional experiment to see if the visual information affects differently depending on the product category. The result enables us to identify which categories were affected and which were not. We expect VizNCS to improve the recommender system performance and expand the recommender system's data source to visual information.

Detection of Zebra-crossing Areas Based on Deep Learning with Combination of SegNet and ResNet (SegNet과 ResNet을 조합한 딥러닝에 기반한 횡단보도 영역 검출)

  • Liang, Han;Seo, Suyoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.3
    • /
    • pp.141-148
    • /
    • 2021
  • This paper presents a method to detect zebra-crossing using deep learning which combines SegNet and ResNet. For the blind, a safe crossing system is important to know exactly where the zebra-crossings are. Zebra-crossing detection by deep learning can be a good solution to this problem and robotic vision-based assistive technologies sprung up over the past few years, which focused on specific scene objects using monocular detectors. These traditional methods have achieved significant results with relatively long processing times, and enhanced the zebra-crossing perception to a large extent. However, running all detectors jointly incurs a long latency and becomes computationally prohibitive on wearable embedded systems. In this paper, we propose a model for fast and stable segmentation of zebra-crossing from captured images. The model is improved based on a combination of SegNet and ResNet and consists of three steps. First, the input image is subsampled to extract image features and the convolutional neural network of ResNet is modified to make it the new encoder. Second, through the SegNet original up-sampling network, the abstract features are restored to the original image size. Finally, the method classifies all pixels and calculates the accuracy of each pixel. The experimental results prove the efficiency of the modified semantic segmentation algorithm with a relatively high computing speed.

Training Performance Analysis of Semantic Segmentation Deep Learning Model by Progressive Combining Multi-modal Spatial Information Datasets (다중 공간정보 데이터의 점진적 조합에 의한 의미적 분류 딥러닝 모델 학습 성능 분석)

  • Lee, Dae-Geon;Shin, Young-Ha;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.2
    • /
    • pp.91-108
    • /
    • 2022
  • In most cases, optical images have been used as training data of DL (Deep Learning) models for object detection, recognition, identification, classification, semantic segmentation, and instance segmentation. However, properties of 3D objects in the real-world could not be fully explored with 2D images. One of the major sources of the 3D geospatial information is DSM (Digital Surface Model). In this matter, characteristic information derived from DSM would be effective to analyze 3D terrain features. Especially, man-made objects such as buildings having geometrically unique shape could be described by geometric elements that are obtained from 3D geospatial data. The background and motivation of this paper were drawn from concept of the intrinsic image that is involved in high-level visual information processing. This paper aims to extract buildings after classifying terrain features by training DL model with DSM-derived information including slope, aspect, and SRI (Shaded Relief Image). The experiments were carried out using DSM and label dataset provided by ISPRS (International Society for Photogrammetry and Remote Sensing) for CNN-based SegNet model. In particular, experiments focus on combining multi-source information to improve training performance and synergistic effect of the DL model. The results demonstrate that buildings were effectively classified and extracted by the proposed approach.

A Study on the Use of Contrast Agent and the Improvement of Body Part Classification Performance through Deep Learning-Based CT Scan Reconstruction (딥러닝 기반 CT 스캔 재구성을 통한 조영제 사용 및 신체 부위 분류 성능 향상 연구)

  • Seongwon Na;Yousun Ko;Kyung Won Kim
    • Journal of Broadcast Engineering
    • /
    • v.28 no.3
    • /
    • pp.293-301
    • /
    • 2023
  • Unstandardized medical data collection and management are still being conducted manually, and studies are being conducted to classify CT data using deep learning to solve this problem. However, most studies are developing models based only on the axial plane, which is a basic CT slice. Because CT images depict only human structures unlike general images, reconstructing CT scans alone can provide richer physical features. This study seeks to find ways to achieve higher performance through various methods of converting CT scan to 2D as well as axial planes. The training used 1042 CT scans from five body parts and collected 179 test sets and 448 with external datasets for model evaluation. To develop a deep learning model, we used InceptionResNetV2 pre-trained with ImageNet as a backbone and re-trained the entire layer of the model. As a result of the experiment, the reconstruction data model achieved 99.33% in body part classification, 1.12% higher than the axial model, and the axial model was higher only in brain and neck in contrast classification. In conclusion, it was possible to achieve more accurate performance when learning with data that shows better anatomical features than when trained with axial slice alone.

Research on Local and Global Infrared Image Pre-Processing Methods for Deep Learning Based Guided Weapon Target Detection

  • Jae-Yong Baek;Dae-Hyeon Park;Hyuk-Jin Shin;Yong-Sang Yoo;Deok-Woong Kim;Du-Hwan Hur;SeungHwan Bae;Jun-Ho Cheon;Seung-Hwan Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.41-51
    • /
    • 2024
  • In this paper, we explore the enhancement of target detection accuracy in the guided weapon using deep learning object detection on infrared (IR) images. Due to the characteristics of IR images being influenced by factors such as time and temperature, it's crucial to ensure a consistent representation of object features in various environments when training the model. A simple way to address this is by emphasizing the features of target objects and reducing noise within the infrared images through appropriate pre-processing techniques. However, in previous studies, there has not been sufficient discussion on pre-processing methods in learning deep learning models based on infrared images. In this paper, we aim to investigate the impact of image pre-processing techniques on infrared image-based training for object detection. To achieve this, we analyze the pre-processing results on infrared images that utilized global or local information from the video and the image. In addition, in order to confirm the impact of images converted by each pre-processing technique on object detector training, we learn the YOLOX target detector for images processed by various pre-processing methods and analyze them. In particular, the results of the experiments using the CLAHE (Contrast Limited Adaptive Histogram Equalization) shows the highest detection accuracy with a mean average precision (mAP) of 81.9%.

Development of Convolutional Network-based Denoising Technique using Deep Reinforcement Learning in Computed Tomography (심층강화학습을 이용한 Convolutional Network 기반 전산화단층영상 잡음 저감 기술 개발)

  • Cho, Jenonghyo;Yim, Dobin;Nam, Kibok;Lee, Dahye;Lee, Seungwan
    • Journal of the Korean Society of Radiology
    • /
    • v.14 no.7
    • /
    • pp.991-1001
    • /
    • 2020
  • Supervised deep learning technologies for improving the image quality of computed tomography (CT) need a lot of training data. When input images have different characteristics with training images, the technologies cause structural distortion in output images. In this study, an imaging model based on the deep reinforcement learning (DRL) was developed for overcoming the drawbacks of the supervised deep learning technologies and reducing noise in CT images. The DRL model was consisted of shared, value and policy networks, and the networks included convolutional layers, rectified linear unit (ReLU), dilation factors and gate rotation unit (GRU) in order to extract noise features from CT images and improve the performance of the DRL model. Also, the quality of the CT images obtained by using the DRL model was compared to that obtained by using the supervised deep learning model. The results showed that the image accuracy for the DRL model was higher than that for the supervised deep learning model, and the image noise for the DRL model was smaller than that for the supervised deep learning model. Also, the DRL model reduced the noise of the CT images, which had different characteristics with training images. Therefore, the DRL model is able to reduce image noise as well as maintain the structural information of CT images.