• Title/Summary/Keyword: convolution model

Search Result 407, Processing Time 0.027 seconds

ADD-Net: Attention Based 3D Dense Network for Action Recognition

  • Man, Qiaoyue;Cho, Young Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.6
    • /
    • pp.21-28
    • /
    • 2019
  • Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.

Saliency-Assisted Collaborative Learning Network for Road Scene Semantic Segmentation

  • Haifeng Sima;Yushuang Xu;Minmin Du;Meng Gao;Jing Wang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.3
    • /
    • pp.861-880
    • /
    • 2023
  • Semantic segmentation of road scene is the key technology of autonomous driving, and the improvement of convolutional neural network architecture promotes the improvement of model segmentation performance. The existing convolutional neural network has the simplification of learning knowledge and the complexity of the model. To address this issue, we proposed a road scene semantic segmentation algorithm based on multi-task collaborative learning. Firstly, a depthwise separable convolution atrous spatial pyramid pooling is proposed to reduce model complexity. Secondly, a collaborative learning framework is proposed involved with saliency detection, and the joint loss function is defined using homoscedastic uncertainty to meet the new learning model. Experiments are conducted on the road and nature scenes datasets. The proposed method achieves 70.94% and 64.90% mIoU on Cityscapes and PASCAL VOC 2012 datasets, respectively. Qualitatively, Compared to methods with excellent performance, the method proposed in this paper has significant advantages in the segmentation of fine targets and boundaries.

Age and gender prediction model using CNN (CNN 알고리즘을 이용한 나이와 성별 구분 모델)

  • Sung Han Shin;Heung Seok Jeon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.47-50
    • /
    • 2023
  • 본 논문에서는 딥러닝 CNN 알고리즘을 이용하여 사람의 얼굴 이미지를 학습한 다음 나이와 성별을 예측하는 시스템을 제안한다. 이 시스템은 개개인 마다 각기 다른 외형적 특성을 고려하여 이를 분석한 다음 이에 맞는 헤어 스타일, 옷차림을 추천할 수 있다. 해당 기술을 활용하여 메타버스 아바타 생성에 사용자의 얼굴과 같은 신체적 특성을 고려할 수 있다. 향후에는 신체 전체를 이미지화하여 보다 더 다양한 정보를 인식할 수 있도록 연구를 진행할 것이다.

  • PDF

Design of a Deep Neural Network Model for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델의 설계)

  • Kim, Dongha;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.

Hybrid All-Reduce Strategy with Layer Overlapping for Reducing Communication Overhead in Distributed Deep Learning (분산 딥러닝에서 통신 오버헤드를 줄이기 위해 레이어를 오버래핑하는 하이브리드 올-리듀스 기법)

  • Kim, Daehyun;Yeo, Sangho;Oh, Sangyoon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.7
    • /
    • pp.191-198
    • /
    • 2021
  • Since the size of training dataset become large and the model is getting deeper to achieve high accuracy in deep learning, the deep neural network training requires a lot of computation and it takes too much time with a single node. Therefore, distributed deep learning is proposed to reduce the training time by distributing computation across multiple nodes. In this study, we propose hybrid allreduce strategy that considers the characteristics of each layer and communication and computational overlapping technique for synchronization of distributed deep learning. Since the convolution layer has fewer parameters than the fully-connected layer as well as it is located at the upper, only short overlapping time is allowed. Thus, butterfly allreduce is used to synchronize the convolution layer. On the other hand, fully-connecter layer is synchronized using ring all-reduce. The empirical experiment results on PyTorch with our proposed scheme shows that the proposed method reduced the training time by up to 33% compared to the baseline PyTorch.

MF sampler: Sampling method for improving the performance of a video based fashion retrieval model (MF sampler: 동영상 기반 패션 검색 모델의 성능 향상을 위한 샘플링 방법)

  • Baek, Sanghun;Park, Jonghyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.329-346
    • /
    • 2022
  • Recently, as the market for short form videos (Instagram, TikTok, YouTube) on social media has gradually increased, research using them is actively being conducted in the artificial intelligence field. A representative research field is Video to Shop, which detects fashion products in videos and searches for product images. In such a video-based artificial intelligence model, product features are extracted using convolution operations. However, due to the limitation of computational resources, extracting features using all the frames in the video is practically impossible. For this reason, existing studies have improved the model's performance by sampling only a part of the entire frame or developing a sampling method using the subject's characteristics. In the existing Video to Shop study, when sampling frames, some frames are randomly sampled or sampled at even intervals. However, this sampling method degrades the performance of the fashion product search model while sampling noise frames where the product does not exist. Therefore, this paper proposes a sampling method MF (Missing Fashion items on frame) sampler that removes noise frames and improves the performance of the search model. MF sampler has improved the problem of resource limitations by developing a keyframe mechanism. In addition, the performance of the search model is improved through noise frame removal using the noise detection model. As a result of the experiment, it was confirmed that the proposed method improves the model's performance and helps the model training to be effective.

Nonparametric Nonlinear Model Predictive Control

  • Kashiwagi, Hiroshi;Li, Yun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1443-1448
    • /
    • 2003
  • Model Predictive Control (MPC) has recently found wide acceptance in industrial applications, but its potential has been much impounded by linear models due to the lack of a similarly accepted nonlinear modelling or data based technique. The authors have recently developed a new method for obtaining Volterra kernels of up to third order by use of pseudorandom M-sequence. By use of this method, nonparametric NMPC is derived in discrete-time using multi-dimensional convolution between plant data and Volterra kernel measurements. This approach is applied to an industrial polymerisation process using Volterra kernels of up to the third order. Results show that the nonparametric approach is very efficient and effective and considerably outperforms existing methods, while retaining the original data-based spirit and characteristics of linear MPC.

  • PDF

A New Probabilistic Generation Simulation Considering Hydro, Pumped-Storage Plants and Multi-Model (수력,양수 및 다중모델을 고려한 새로운 확률론적 발전시뮬레이션)

  • 송길영;최재석
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.40 no.6
    • /
    • pp.551-561
    • /
    • 1991
  • The probabilistic generation simulation plays a key role in power system expansion and operational planning especially for the calculation of expected energy, loss of load probaility and unserved energy expected. However, it is crucial to develop a probabilistic generation simulation algorithm which gives sufficiently precise results within a reasonable computation time. In a previous paper, we have proposed an efficent method using Fast Hartley Transform in convolution process for considering the thermal and nuclear units. In this paper, a method considering the scheduling of pumped-storage plants and hydro plants with energy constraint is proposed. The method also adopts FHT techniques. We improve the model to include multi-state and multi-block generation. The method has been applied for a real size model system.

  • PDF

QoS Analysis of a Distributed System Considering the Processing Time (처리시간을 고려한 분산시스템의 서비스 품질분석)

  • Kim, Jung-Ho;Park, Jong-Hun
    • Journal of Korean Society for Quality Management
    • /
    • v.39 no.3
    • /
    • pp.412-421
    • /
    • 2011
  • In this paper, we introduce Quality of Service(QoS) analytic model of a distributed system that decentralizes the process nodes performing each task and communicates through a network for cooperation. The model advances a service reliability model of Dai et a1.(2003) by means of considering the processing time. The service is assumed to be provided by a centralized heterogeneous distributed system which is composed of some subsystems managed by a control center. The QoS is defined as the probability that a service is provided successfully in an allowed time, we consider the hardware/software reliability and the processing time which include program execution time, data transfer time. We derive the processing time distribution for a required service through convolution of corresponding probability density function. An application example is used to explain the procedure of computing quality of service.

Feature Extraction on a Periocular Region and Person Authentication Using a ResNet Model (ResNet 모델을 이용한 눈 주변 영역의 특징 추출 및 개인 인증)

  • Kim, Min-Ki
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.12
    • /
    • pp.1347-1355
    • /
    • 2019
  • Deep learning approach based on convolution neural network (CNN) has extensively studied in the field of computer vision. However, periocular feature extraction using CNN was not well studied because it is practically impossible to collect large volume of biometric data. This study uses the ResNet model which was trained with the ImageNet dataset. To overcome the problem of insufficient training data, we focused on the training of multi-layer perception (MLP) having simple structure rather than training the CNN having complex structure. It first extracts features using the pretrained ResNet model and reduces the feature dimension by principle component analysis (PCA), then trains a MLP classifier. Experimental results with the public periocular dataset UBIPr show that the proposed method is effective in person authentication using periocular region. Especially it has the advantage which can be directly applied for other biometric traits.