• Title/Summary/Keyword: Deep Features

Search Result 1,096, Processing Time 0.024 seconds

Stylized Image Generation based on Music-image Synesthesia Emotional Style Transfer using CNN Network

  • Xing, Baixi;Dou, Jian;Huang, Qing;Si, Huahao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1464-1485
    • /
    • 2021
  • Emotional style of multimedia art works are abstract content information. This study aims to explore emotional style transfer method and find the possible way of matching music with appropriate images in respect to emotional style. DCNNs (Deep Convolutional Neural Networks) can capture style and provide emotional style transfer iterative solution for affective image generation. Here, we learn the image emotion features via DCNNs and map the affective style on the other images. We set image emotion feature as the style target in this style transfer problem, and held experiments to handle affective image generation of eight emotion categories, including dignified, dreaming, sad, vigorous, soothing, exciting, joyous, and graceful. A user study was conducted to test the synesthesia emotional image style transfer result with ground truth user perception triggered by the music-image pairs' stimuli. The transferred affective image result for music-image emotional synesthesia perception was proved effective according to user study result.

Abnormal Situation Detection on Surveillance Video Using Object Detection and Action Recognition (객체 탐지와 행동인식을 이용한 영상내의 비정상적인 상황 탐지 네트워크)

  • Kim, Jeong-Hun;Choi, Jong-Hyeok;Park, Young-Ho;Nasridinov, Aziz
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.2
    • /
    • pp.186-198
    • /
    • 2021
  • Security control using surveillance cameras is established when people observe all surveillance videos directly. However, this task is labor-intensive and it is difficult to detect all abnormal situations. In this paper, we propose a deep neural network model, called AT-Net, that automatically detects abnormal situations in the surveillance video, and introduces an automatic video surveillance system developed based on this network model. In particular, AT-Net alleviates the ambiguity of existing abnormal situation detection methods by mapping features representing relationships between people and objects in surveillance video to the new tensor structure based on sparse coding. Through experiments on actual surveillance videos, AT-Net achieved an F1-score of about 89%, and improved abnormal situation detection performance by more than 25% compared to existing methods.

Proper Noun Embedding Model for the Korean Dependency Parsing

  • Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.93-102
    • /
    • 2022
  • Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.

Semantic Segmentation of Heterogeneous Unmanned Aerial Vehicle Datasets Using Combined Segmentation Network

  • Ahram, Song
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.1
    • /
    • pp.87-97
    • /
    • 2023
  • Unmanned aerial vehicles (UAVs) can capture high-resolution imagery from a variety of viewing angles and altitudes; they are generally limited to collecting images of small scenes from larger regions. To improve the utility of UAV-appropriated datasetsfor use with deep learning applications, multiple datasets created from variousregions under different conditions are needed. To demonstrate a powerful new method for integrating heterogeneous UAV datasets, this paper applies a combined segmentation network (CSN) to share UAVid and semantic drone dataset encoding blocks to learn their general features, whereas its decoding blocks are trained separately on each dataset. Experimental results show that our CSN improves the accuracy of specific classes (e.g., cars), which currently comprise a low ratio in both datasets. From this result, it is expected that the range of UAV dataset utilization will increase.

A Study of Real-time Semantic Segmentation Performance Improvement in Unstructured Outdoor Environment (비정형 야지환경 주행상황에서의 실시간 의미론적 영상 분할 알고리즘 성능 향상에 관한 연구)

  • Daeyoung, Kim;Seunguk, Ahn;Seung-Woo, Seo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.6
    • /
    • pp.606-616
    • /
    • 2022
  • Semantic segmentation in autonomous driving for unstructured environments is challenging due to the presence of uneven terrains, unstructured class boundaries, irregular features and strong textures. Current off-road datasets exhibit difficulties like class imbalance and understanding of varying environmental topography. To overcome these issues, we propose a deep learning framework for semantic segmentation that involves a pooled class semantic segmentation with five classes. The evaluation of the framework is carried out on two off-road driving datasets, RUGD and TAS500. The results show that our proposed method achieves high accuracy and real-time performance.

A review and comparison of convolution neural network models under a unified framework

  • Park, Jimin;Jung, Yoonsuh
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.161-176
    • /
    • 2022
  • There has been active research in image classification using deep learning convolutional neural network (CNN) models. ImageNet large-scale visual recognition challenge (ILSVRC) (2010-2017) was one of the most important competitions that boosted the development of efficient deep learning algorithms. This paper introduces and compares six monumental models that achieved high prediction accuracy in ILSVRC. First, we provide a review of the models to illustrate their unique structure and characteristics of the models. We then compare those models under a unified framework. For this reason, additional devices that are not crucial to the structure are excluded. Four popular data sets with different characteristics are then considered to measure the prediction accuracy. By investigating the characteristics of the data sets and the models being compared, we provide some insight into the architectural features of the models.

Artificial Intelligence in Neuroimaging: Clinical Applications

  • Choi, Kyu Sung;Sunwoo, Leonard
    • Investigative Magnetic Resonance Imaging
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2022
  • Artificial intelligence (AI) powered by deep learning (DL) has shown remarkable progress in image recognition tasks. Over the past decade, AI has proven its feasibility for applications in medical imaging. Various aspects of clinical practice in neuroimaging can be improved with the help of AI. For example, AI can aid in detecting brain metastases, predicting treatment response of brain tumors, generating a parametric map of dynamic contrast-enhanced MRI, and enhancing radiomics research by extracting salient features from input images. In addition, image quality can be improved via AI-based image reconstruction or motion artifact reduction. In this review, we summarize recent clinical applications of DL in various aspects of neuroimaging.

Deep Reference-based Dynamic Scene Deblurring

  • Cunzhe Liu;Zhen Hua;Jinjiang Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.653-669
    • /
    • 2024
  • Dynamic scene deblurring is a complex computer vision problem owing to its difficulty to model mathematically. In this paper, we present a novel approach for image deblurring with the help of the sharp reference image, which utilizes the reference image for high-quality and high-frequency detail results. To better utilize the clear reference image, we develop an encoder-decoder network and two novel modules are designed to guide the network for better image restoration. The proposed Reference Extraction and Aggregation Module can effectively establish the correspondence between blurry image and reference image and explore the most relevant features for better blur removal and the proposed Spatial Feature Fusion Module enables the encoder to perceive blur information at different spatial scales. In the final, the multi-scale feature maps from the encoder and cascaded Reference Extraction and Aggregation Modules are integrated into the decoder for a global fusion and representation. Extensive quantitative and qualitative experimental results from the different benchmarks show the effectiveness of our proposed method.

Performance of End-to-end Model Based on Convolutional LSTM for Human Activity Recognition

  • Young Ghyu Sun;Soo Hyun Kim;Seongwoo Lee;Joonho Seon;SangWoon Lee;Cheong Ghil Kim;Jin Young Kim
    • Journal of Web Engineering
    • /
    • v.21 no.5
    • /
    • pp.1671-1690
    • /
    • 2022
  • Human activity recognition (HAR) is a key technology in many applications, such as smart signage, smart healthcare, smart home, etc. In HAR, deep learning-based methods have been proposed to recognize activity data effectively from video streams. In this paper, the end-to-end model based on convolutional long short-term memory (LSTM) is proposed to recognize human activities. Convolutional LSTM can learn features of spatial and temporal simultaneously from video stream data. Also, the number of learning weights can be diminished by employing convolutional LSTM with an end-to-end model. The proposed HAR model was optimized with various simulation environments using activities data from the AI hub. From simulation results, it can be confirmed that the proposed model can be outperformed compared with the conventional model.

Vehicle Type Classification Method for Road Traffic Surveys (도로교통량 조사를 위한 12종 차종 분류 방법)

  • Mi-Seon Kang;Chan-Ho Kim;Pyong-Kun Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.19 no.5
    • /
    • pp.227-234
    • /
    • 2024
  • This paper proposes a novel method for effectively classifying 12 vehicle types required for road traffic surveys by utilizing deep learning techniques. In particular, it focuses on the trailer vehicle types, classified as types 8 to 12, which have been challenging in previous research due to data scarcity. A zero-shot learning approach, Grounding DINO, is employed to extract key features that can distinguish these trailer types, addressing the data imbalance issue. This method enables accurate classification of the underrepresented vehicle types, leading to efficient classification across all 12 types. To the best of the authors' knowledge, this is the first attempt to classify 12 vehicle types required for road traffic surveys using publicly available video data.