• Title/Summary/Keyword: Multi-scale CNN

Search Result 31, Processing Time 0.025 seconds

Low Resolution Rate Face Recognition Based on Multi-scale CNN

  • Wang, Ji-Yuan;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.12
    • /
    • pp.1467-1472
    • /
    • 2018
  • For the problem that the face image of surveillance video cannot be accurately identified due to the low resolution, this paper proposes a low resolution face recognition solution based on convolutional neural network model. Convolutional Neural Networks (CNN) model for multi-scale input The CNN model for multi-scale input is an improvement over the existing "two-step method" in which low-resolution images are up-sampled using a simple bi-cubic interpolation method. Then, the up sampled image and the high-resolution image are mixed as a model training sample. The CNN model learns the common feature space of the high- and low-resolution images, and then measures the feature similarity through the cosine distance. Finally, the recognition result is given. The experiments on the CMU PIE and Extended Yale B datasets show that the accuracy of the model is better than other comparison methods. Compared with the CMDA_BGE algorithm with the highest recognition rate, the accuracy rate is 2.5%~9.9%.

Road Damage Detection and Classification based on Multi-level Feature Pyramids

  • Yin, Junru;Qu, Jiantao;Huang, Wei;Chen, Qiqiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.786-799
    • /
    • 2021
  • Road damage detection is important for road maintenance. With the development of deep learning, more and more road damage detection methods have been proposed, such as Fast R-CNN, Faster R-CNN, Mask R-CNN and RetinaNet. However, because shallow and deep layers cannot be extracted at the same time, the existing methods do not perform well in detecting objects with fewer samples. In addition, these methods cannot obtain a highly accurate detecting bounding box. This paper presents a Multi-level Feature Pyramids method based on M2det. Because the feature layer has multi-scale and multi-level architecture, the feature layer containing more information and obvious features can be extracted. Moreover, an attention mechanism is used to improve the accuracy of local boundary boxes in the dataset. Experimental results show that the proposed method is better than the current state-of-the-art methods.

Multi-spectral Vehicle Detection based on Convolutional Neural Network

  • Choi, Sungil;Kim, Seungryong;Park, Kihong;Sohn, Kwanghoon
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.12
    • /
    • pp.1909-1918
    • /
    • 2016
  • This paper presents a unified framework for joint Convolutional Neural Network (CNN) based vehicle detection by leveraging multi-spectral image pairs. With the observation that under challenging environments such as night vision and limited light source, vehicle detection in a single color image can be more tractable by using additional far-infrared (FIR) image, we design joint CNN architecture for both RGB and FIR image pairs. We assume that a score map from joint CNN applied to overall image can be considered as confidence of vehicle existence. To deal with various scale ratios of vehicle candidates, multi-scale images are first generated scaling an image according to possible scale ratio of vehicles. The vehicle candidates are then detected on local maximal on each score maps. The generation of overlapped candidates is prevented with non-maximal suppression on multi-scale score maps. The experimental results show that our framework have superior performance than conventional methods with a joint framework of multi-spectral image pairs reducing false positive generated by conventional vehicle detection framework using only single color image.

Scale-aware Faster R-CNN for Caltech Pedestrian Detection (Caltech 보행자 감지를 위한 Scale-aware Faster R-CNN)

  • Byambajav, Batkhuu;Alikhanov, Jumabek;Jo, Geun-Sik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.506-509
    • /
    • 2016
  • We present real-time pedestrian detection that exploit accuracy of Faster R-CNN network. Faster R-CNN has shown to success at PASCAL VOC multi-object detection tasks, and their ability to operate on raw pixel input without the need to design special features is very engaging. Therefore, in this work we apply and adjust Faster R-CNN to single object detection, which is pedestrian detection. The drawback of Faster R-CNN is its failure when object size is small. Previously, small sized object problem was solved by Scale-aware Network. We incorporate Scale-aware Network to Faster R-CNN. This made our method Scale-aware Faster R-CNN (DF R-CNN) that is both fast and very accurate. We separated Faster R-CNN networks into two sub-network, that is one for large-size objects and another one for small-size objects. The resulting approach achieves a 28.3% average miss rate on the Caltech Pedestrian detection benchmark, which is competitive with the other best reported results.

Integration of Multi-scale CAM and Attention for Weakly Supervised Defects Localization on Surface Defective Apple

  • Nguyen Bui Ngoc Han;Ju Hwan Lee;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.45-59
    • /
    • 2023
  • Weakly supervised object localization (WSOL) is a task of localizing an object in an image using only image-level labels. Previous studies have followed the conventional class activation mapping (CAM) pipeline. However, we reveal the current CAM approach suffers from problems which cause original CAM could not capture the complete defects features. This work utilizes a convolutional neural network (CNN) pretrained on image-level labels to generate class activation maps in a multi-scale manner to highlight discriminative regions. Additionally, a vision transformer (ViT) pretrained was treated to produce multi-head attention maps as an auxiliary detector. By integrating the CNN-based CAMs and attention maps, our approach localizes defective regions without requiring bounding box or pixel-level supervision during training. We evaluate our approach on a dataset of apple images with only image-level labels of defect categories. Experiments demonstrate our proposed method aligns with several Object Detection models performance, hold a promise for improving localization.

Multi-Scale Dilation Convolution Feature Fusion (MsDC-FF) Technique for CNN-Based Black Ice Detection

  • Sun-Kyoung KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.3
    • /
    • pp.17-22
    • /
    • 2023
  • In this paper, we propose a black ice detection system using Convolutional Neural Networks (CNNs). Black ice poses a serious threat to road safety, particularly during winter conditions. To overcome this problem, we introduce a CNN-based architecture for real-time black ice detection with an encoder-decoder network, specifically designed for real-time black ice detection using thermal images. To train the network, we establish a specialized experimental platform to capture thermal images of various black ice formations on diverse road surfaces, including cement and asphalt. This enables us to curate a comprehensive dataset of thermal road black ice images for a training and evaluation purpose. Additionally, in order to enhance the accuracy of black ice detection, we propose a multi-scale dilation convolution feature fusion (MsDC-FF) technique. This proposed technique dynamically adjusts the dilation ratios based on the input image's resolution, improving the network's ability to capture fine-grained details. Experimental results demonstrate the superior performance of our proposed network model compared to conventional image segmentation models. Our model achieved an mIoU of 95.93%, while LinkNet achieved an mIoU of 95.39%. Therefore, it is concluded that the proposed model in this paper could offer a promising solution for real-time black ice detection, thereby enhancing road safety during winter conditions.

Black Ice Detection Platform and Its Evaluation using Jetson Nano Devices based on Convolutional Neural Network (CNN)

  • Sun-Kyoung KANG;Yeonwoo LEE
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.4
    • /
    • pp.1-8
    • /
    • 2023
  • In this paper, we propose a black ice detection platform framework using Convolutional Neural Networks (CNNs). To overcome black ice problem, we introduce a real-time based early warning platform using CNN-based architecture, and furthermore, in order to enhance the accuracy of black ice detection, we apply a multi-scale dilation convolution feature fusion (MsDC-FF) technique. Then, we establish a specialized experimental platform by using a comprehensive dataset of thermal road black ice images for a training and evaluation purpose. Experimental results of a real-time black ice detection platform show the better performance of our proposed network model compared to conventional image segmentation models. Our proposed platform have achieved real-time segmentation of road black ice areas by deploying a road black ice area segmentation network on the edge device Jetson Nano devices. This approach in parallel using multi-scale dilated convolutions with different dilation rates had faster segmentation speeds due to its smaller model parameters. The proposed MsCD-FF Net(2) model had the fastest segmentation speed at 5.53 frame per second (FPS). Thereby encouraging safe driving for motorists and providing decision support for road surface management in the road traffic monitoring department.

Dual-scale BERT using multi-trait representations for holistic and trait-specific essay grading

  • Minsoo Cho;Jin-Xia Huang;Oh-Woog Kwon
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.82-95
    • /
    • 2024
  • As automated essay scoring (AES) has progressed from handcrafted techniques to deep learning, holistic scoring capabilities have merged. However, specific trait assessment remains a challenge because of the limited depth of earlier methods in modeling dual assessments for holistic and multi-trait tasks. To overcome this challenge, we explore providing comprehensive feedback while modeling the interconnections between holistic and trait representations. We introduce the DualBERT-Trans-CNN model, which combines transformer-based representations with a novel dual-scale bidirectional encoder representations from transformers (BERT) encoding approach at the document-level. By explicitly leveraging multi-trait representations in a multi-task learning (MTL) framework, our DualBERT-Trans-CNN emphasizes the interrelation between holistic and trait-based score predictions, aiming for improved accuracy. For validation, we conducted extensive tests on the ASAP++ and TOEFL11 datasets. Against models of the same MTL setting, ours showed a 2.0% increase in its holistic score. Additionally, compared with single-task learning (STL) models, ours demonstrated a 3.6% enhancement in average multi-trait performance on the ASAP++ dataset.

Comparison Study of the Performance of CNN Models with Multi-view Image Set on the Classification of Ship Hull Blocks (다시점 영상 집합을 활용한 선체 블록 분류를 위한 CNN 모델 성능 비교 연구)

  • Chon, Haemyung;Noh, Jackyou
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.57 no.3
    • /
    • pp.140-151
    • /
    • 2020
  • It is important to identify the location of ship hull blocks with exact block identification number when scheduling the shipbuilding process. The wrong information on the location and identification number of some hull block can cause low productivity by spending time to find where the exact hull block is. In order to solve this problem, it is necessary to equip the system to track the location of the blocks and to identify the identification numbers of the blocks automatically. There were a lot of researches of location tracking system for the hull blocks on the stockyard. However there has been no research to identify the hull blocks on the stockyard. This study compares the performance of 5 Convolutional Neural Network (CNN) models with multi-view image set on the classification of the hull blocks to identify the blocks on the stockyard. The CNN models are open algorithms of ImageNet Large-Scale Visual Recognition Competition (ILSVRC). Four scaled hull block models are used to acquire the images of ship hull blocks. Learning and transfer learning of the CNN models with original training data and augmented data of the original training data were done. 20 tests and predictions in consideration of five CNN models and four cases of training conditions are performed. In order to compare the classification performance of the CNN models, accuracy and average F1-Score from confusion matrix are adopted as the performance measures. As a result of the comparison, Resnet-152v2 model shows the highest accuracy and average F1-Score with full block prediction image set and with cropped block prediction image set.

Interpolation based Single-path Sub-pixel Convolution for Super-Resolution Multi-Scale Networks

  • Alao, Honnang;Kim, Jin-Sung;Kim, Tae Sung;Oh, Juhyen;Lee, Kyujoong
    • Journal of Multimedia Information System
    • /
    • v.8 no.4
    • /
    • pp.203-210
    • /
    • 2021
  • Deep leaning convolutional neural networks (CNN) have successfully been applied to image super-resolution (SR). Despite their great performances, SR techniques tend to focus on a certain upscale factor when training a particular model. Algorithms for single model multi-scale networks can easily be constructed if images are upscaled prior to input, but sub-pixel convolution upsampling works differently for each scale factor. Recent SR methods employ multi-scale and multi-path learning as a solution. However, this causes unshared parameters and unbalanced parameter distribution across various scale factors. We present a multi-scale single-path upsample module as a solution by exploiting the advantages of sub-pixel convolution and interpolation algorithms. The proposed model employs sub-pixel convolution for the highest scale factor among the learning upscale factors, and then utilize 1-dimension interpolation, compressing the learned features on the channel axis to match the desired output image size. Experiments are performed for the single-path upsample module, and compared to the multi-path upsample module. Based on the experimental results, the proposed algorithm reduces the upsample module's parameters by 24% and presents slightly to better performance compared to the previous algorithm.