• Title/Summary/Keyword: Neural Net

Search Result 766, Processing Time 0.026 seconds

A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning (DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1281-1297
    • /
    • 2016
  • In recent years, personal videos have seen a tremendous growth due to the substantial increase in the use of smart devices and networking services in which users create and share video content easily without many restrictions. However, taking both into account would significantly improve event detection performance because videos generally have multiple modalities and the frame data in video varies at different time points. This paper proposes an event detection method. In this method, high-level features are first extracted from multiple modalities in the videos, and the features are rearranged according to time sequence. Then the association of the modalities is learned by means of DNN to produce a personal video event detector. In our proposed method, audio and image data are first synchronized and then extracted. Then, the result is input into GoogLeNet as well as Multi-Layer Perceptron (MLP) to extract high-level features. The results are then re-arranged in time sequence, and every video is processed to extract one feature each for training by means of DNN.

ViStoryNet: Neural Networks with Successive Event Order Embedding and BiLSTMs for Video Story Regeneration (ViStoryNet: 비디오 스토리 재현을 위한 연속 이벤트 임베딩 및 BiLSTM 기반 신경망)

  • Heo, Min-Oh;Kim, Kyung-Min;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.138-144
    • /
    • 2018
  • A video is a vivid medium similar to human's visual-linguistic experiences, since it can inculcate a sequence of situations, actions or dialogues that can be told as a story. In this study, we propose story learning/regeneration frameworks from videos with successive event order supervision for contextual coherence. The supervision induces each episode to have a form of trajectory in the latent space, which constructs a composite representation of ordering and semantics. In this study, we incorporated the use of kids videos as a training data. Some of the advantages associated with the kids videos include omnibus style, simple/explicit storyline in short, chronological narrative order, and relatively limited number of characters and spatial environments. We build the encoder-decoder structure with successive event order embedding, and train bi-directional LSTMs as sequence models considering multi-step sequence prediction. Using a series of approximately 200 episodes of kids videos named 'Pororo the Little Penguin', we give empirical results for story regeneration tasks and SEOE. In addition, each episode shows a trajectory-like shape on the latent space of the model, which gives the geometric information for the sequence models.

A comparison of deep-learning models to the forecast of the daily solar flare occurrence using various solar images

  • Shin, Seulki;Moon, Yong-Jae;Chu, Hyoungseok
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.2
    • /
    • pp.61.1-61.1
    • /
    • 2017
  • As the application of deep-learning methods has been succeeded in various fields, they have a high potential to be applied to space weather forecasting. Convolutional neural network, one of deep learning methods, is specialized in image recognition. In this study, we apply the AlexNet architecture, which is a winner of Imagenet Large Scale Virtual Recognition Challenge (ILSVRC) 2012, to the forecast of daily solar flare occurrence using the MatConvNet software of MATLAB. Our input images are SOHO/MDI, EIT $195{\AA}$, and $304{\AA}$ from January 1996 to December 2010, and output ones are yes or no of flare occurrence. We consider other input images which consist of last two images and their difference image. We select training dataset from Jan 1996 to Dec 2000 and from Jan 2003 to Dec 2008. Testing dataset is chosen from Jan 2001 to Dec 2002 and from Jan 2009 to Dec 2010 in order to consider the solar cycle effect. In training dataset, we randomly select one fifth of training data for validation dataset to avoid the over-fitting problem. Our model successfully forecasts the flare occurrence with about 0.90 probability of detection (POD) for common flares (C-, M-, and X-class). While POD of major flares (M- and X-class) forecasting is 0.96, false alarm rate (FAR) also scores relatively high(0.60). We also present several statistical parameters such as critical success index (CSI) and true skill statistics (TSS). All statistical parameters do not strongly depend on the number of input data sets. Our model can immediately be applied to automatic forecasting service when image data are available.

  • PDF

Classification of Clothing Using Googlenet Deep Learning and IoT based on Artificial Intelligence (인공지능 기반 구글넷 딥러닝과 IoT를 이용한 의류 분류)

  • Noh, Sun-Kuk
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.41-45
    • /
    • 2020
  • Recently, artificial intelligence (AI) and the Internet of things (IoT), which are represented by machine learning and deep learning among IT technologies related to the Fourth Industrial Revolution, are applied to our real life in various fields through various researches. In this paper, IoT and AI using object recognition technology are applied to classify clothing. For this purpose, the image dataset was taken using webcam and raspberry pi, and GoogLeNet, a convolutional neural network artificial intelligence network, was applied to transfer the photographed image data. The clothing image dataset was classified into two categories (shirtwaist, trousers): 900 clean images, 900 loss images, and total 1800 images. The classification measurement results showed that the accuracy of the clean clothing image was about 97.78%. In conclusion, the study confirmed the applicability of other objects using artificial intelligence networks on the Internet of Things based platform through the measurement results and the supplementation of more image data in the future.

MSaGAN: Improved SaGAN using Guide Mask and Multitask Learning Approach for Facial Attribute Editing

  • Yang, Hyeon Seok;Han, Jeong Hoon;Moon, Young Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.37-46
    • /
    • 2020
  • Recently, studies of facial attribute editing have obtained realistic results using generative adversarial net (GAN) and encoder-decoder structure. Spatial attention GAN (SaGAN), one of the latest researches, is the method that can change only desired attribute in a face image by spatial attention mechanism. However, sometimes unnatural results are obtained due to insufficient information on face areas. In this paper, we propose an improved SaGAN (MSaGAN) using a guide mask for learning and applying multitask learning approach to improve the limitations of the existing methods. Through extensive experiments, we evaluated the results of the facial attribute editing in therms of the mask loss function and the neural network structure. It has been shown that the proposed method can efficiently produce more natural results compared to the previous methods.

A Fully Convolutional Network Model for Classifying Liver Fibrosis Stages from Ultrasound B-mode Images (초음파 B-모드 영상에서 FCN(fully convolutional network) 모델을 이용한 간 섬유화 단계 분류 알고리즘)

  • Kang, Sung Ho;You, Sun Kyoung;Lee, Jeong Eun;Ahn, Chi Young
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.1
    • /
    • pp.48-54
    • /
    • 2020
  • In this paper, we deal with a liver fibrosis classification problem using ultrasound B-mode images. Commonly representative methods for classifying the stages of liver fibrosis include liver biopsy and diagnosis based on ultrasound images. The overall liver shape and the smoothness and roughness of speckle pattern represented in ultrasound images are used for determining the fibrosis stages. Although the ultrasound image based classification is used frequently as an alternative or complementary method of the invasive biopsy, it also has the limitations that liver fibrosis stage decision depends on the image quality and the doctor's experience. With the rapid development of deep learning algorithms, several studies using deep learning methods have been carried out for automated liver fibrosis classification and showed superior performance of high accuracy. The performance of those deep learning methods depends closely on the amount of datasets. We propose an enhanced U-net architecture to maximize the classification accuracy with limited small amount of image datasets. U-net is well known as a neural network for fast and precise segmentation of medical images. We design it newly for the purpose of classifying liver fibrosis stages. In order to assess the performance of the proposed architecture, numerical experiments are conducted on a total of 118 ultrasound B-mode images acquired from 78 patients with liver fibrosis symptoms of F0~F4 stages. The experimental results support that the performance of the proposed architecture is much better compared to the transfer learning using the pre-trained model of VGGNet.

Indoor Scene Classification based on Color and Depth Images for Automated Reverberation Sound Editing (자동 잔향 편집을 위한 컬러 및 깊이 정보 기반 실내 장면 분류)

  • Jeong, Min-Heuk;Yu, Yong-Hyun;Park, Sung-Jun;Hwang, Seung-Jun;Baek, Joong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.3
    • /
    • pp.384-390
    • /
    • 2020
  • The reverberation effect on the sound when producing movies or VR contents is a very important factor in the realism and liveliness. The reverberation time depending the space is recommended in a standard called RT60(Reverberation Time 60 dB). In this paper, we propose a scene recognition technique for automatic reverberation editing. To this end, we devised a classification model that independently trains color images and predicted depth images in the same model. Indoor scene classification is limited only by training color information because of the similarity of internal structure. Deep learning based depth information extraction technology is used to use spatial depth information. Based on RT60, 10 scene classes were constructed and model training and evaluation were conducted. Finally, the proposed SCR + DNet (Scene Classification for Reverb + Depth Net) classifier achieves higher performance than conventional CNN classifiers with 92.4% accuracy.

Application of CCTV Image and Semantic Segmentation Model for Water Level Estimation of Irrigation Channel (관개용수로 CCTV 이미지를 이용한 CNN 딥러닝 이미지 모델 적용)

  • Kim, Kwi-Hoon;Kim, Ma-Ga;Yoon, Pu-Reun;Bang, Je-Hong;Myoung, Woo-Ho;Choi, Jin-Yong;Choi, Gyu-Hoon
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.64 no.3
    • /
    • pp.63-73
    • /
    • 2022
  • A more accurate understanding of the irrigation water supply is necessary for efficient agricultural water management. Although we measure water levels in an irrigation canal using ultrasonic water level gauges, some errors occur due to malfunctions or the surrounding environment. This study aims to apply CNN (Convolutional Neural Network) Deep-learning-based image classification and segmentation models to the irrigation canal's CCTV (Closed-Circuit Television) images. The CCTV images were acquired from the irrigation canal of the agricultural reservoir in Cheorwon-gun, Gangwon-do. We used the ResNet-50 model for the image classification model and the U-Net model for the image segmentation model. Using the Natural Breaks algorithm, we divided water level data into 2, 4, and 8 groups for image classification models. The classification models of 2, 4, and 8 groups showed the accuracy of 1.000, 0.987, and 0.634, respectively. The image segmentation model showed a Dice score of 0.998 and predicted water levels showed R2 of 0.97 and MAE (Mean Absolute Error) of 0.02 m. The image classification models can be applied to the automatic gate-controller at four divisions of water levels. Also, the image segmentation model results can be applied to the alternative measurement for ultrasonic water gauges. We expect that the results of this study can provide a more scientific and efficient approach for agricultural water management.

Short-Term Crack in Sewer Forecasting Method Based on CNN-LSTM Hybrid Neural Network Model (CNN-LSTM 합성모델에 의한 하수관거 균열 예측모델)

  • Jang, Seung-Ju;Jang, Seung-Yup
    • Journal of the Korean Geosynthetics Society
    • /
    • v.21 no.2
    • /
    • pp.11-19
    • /
    • 2022
  • In this paper, we propose a GoogleNet transfer learning and CNN-LSTM combination method to improve the time-series prediction performance for crack detection using crack data captured inside the sewer pipes. LSTM can solve the long-term dependency problem of CNN, so spatial and temporal characteristics can be considered at the same time. The predictive performance of the proposed method is excellent in all test variables as a result of comparing the RMSE(Root Mean Square Error) for time series sections using the crack data inside the sewer pipe. In addition, as a result of examining the prediction performance at the time of data generation, the proposed method was verified that it is effective in predicting crack detection by comparing with the existing CNN-only model. If the proposed method and experimental results obtained through this study are utilized, it can be applied in various fields such as the environment and humanities where time series data occurs frequently as well as crack data of concrete structures.

Integrated Water Resources Management in the Era of nGreat Transition

  • Ashkan Noori;Seyed Hossein Mohajeri;Milad Niroumand Jadidi;Amir Samadi
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.34-34
    • /
    • 2023
  • The Chah-Nimeh reservoirs, which are a sort of natural lakes located in the border of Iran and Afghanistan, are the main drinking and agricultural water resources of Sistan arid region. Considering the occurrence of intense seasonal wind, locally known as levar wind, this study aims to explore the possibility to provide a TSM (Total Suspended Matter) monitoring model of Chah-Nimeh reservoirs using multi-temporal satellite images and in-situ wind speed data. The results show that a strong correlation between TSM concentration and wind speed are present. The developed empirical model indicated high performance in retrieving spatiotemporal distribution of the TSM concentration with R2=0.98 and RMSE=0.92g/m3. Following this observation, we also consider a machine learning-based model to predicts the average TSM using only wind speed. We connect our in-situ wind speed data to the TSM data generated from the inversion of multi-temporal satellite imagery to train a neural network based mode l(Wind2TSM-Net). Examining Wind2TSM-Net model indicates this model can retrieve the TSM accurately utilizing only wind speed (R2=0.88 and RMSE=1.97g/m3). Moreover, this results of this study show tha the TSM concentration can be estimated using only in situ wind speed data independent of the satellite images. Specifically, such model can supply a temporally persistent means of monitoring TSM that is not limited by the temporal resolution of imagery or the cloud cover problem in the optical remote sensing.

  • PDF