• Title/Summary/Keyword: Generative Models

Search Result 180, Processing Time 0.028 seconds

Robust Real-time Tracking of Facial Features with Application to Emotion Recognition (안정적인 실시간 얼굴 특징점 추적과 감정인식 응용)

  • Ahn, Byungtae;Kim, Eung-Hee;Sohn, Jin-Hun;Kweon, In So
    • The Journal of Korea Robotics Society
    • /
    • v.8 no.4
    • /
    • pp.266-272
    • /
    • 2013
  • Facial feature extraction and tracking are essential steps in human-robot-interaction (HRI) field such as face recognition, gaze estimation, and emotion recognition. Active shape model (ASM) is one of the successful generative models that extract the facial features. However, applying only ASM is not adequate for modeling a face in actual applications, because positions of facial features are unstably extracted due to limitation of the number of iterations in the ASM fitting algorithm. The unaccurate positions of facial features decrease the performance of the emotion recognition. In this paper, we propose real-time facial feature extraction and tracking framework using ASM and LK optical flow for emotion recognition. LK optical flow is desirable to estimate time-varying geometric parameters in sequential face images. In addition, we introduce a straightforward method to avoid tracking failure caused by partial occlusions that can be a serious problem for tracking based algorithm. Emotion recognition experiments with k-NN and SVM classifier shows over 95% classification accuracy for three emotions: "joy", "anger", and "disgust".

Face inpainting via Learnable Structure Knowledge of Fusion Network

  • Yang, You;Liu, Sixun;Xing, Bin;Li, Kesen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.877-893
    • /
    • 2022
  • With the development of deep learning, face inpainting has been significantly enhanced in the past few years. Although image inpainting framework integrated with generative adversarial network or attention mechanism enhanced the semantic understanding among facial components, the issues of reconstruction on corrupted regions are still worthy to explore, such as blurred edge structure, excessive smoothness, unreasonable semantic understanding and visual artifacts, etc. To address these issues, we propose a Learnable Structure Knowledge of Fusion Network (LSK-FNet), which learns a prior knowledge by edge generation network for image inpainting. The architecture involves two steps: Firstly, structure information obtained by edge generation network is used as the prior knowledge for face inpainting network. Secondly, both the generated prior knowledge and the incomplete image are fed into the face inpainting network together to get the fusion information. To improve the accuracy of inpainting, both of gated convolution and region normalization are applied in our proposed model. We evaluate our LSK-FNet qualitatively and quantitatively on the CelebA-HQ dataset. The experimental results demonstrate that the edge structure and details of facial images can be improved by using LSK-FNet. Our model surpasses the compared models on L1, PSNR and SSIM metrics. When the masked region is less than 20%, L1 loss reduce by more than 4.3%.

Motion Style Transfer using Variational Autoencoder (변형 자동 인코더를 활용한 모션 스타일 이전)

  • Ahn, Jewon;Kwon, Taesoo
    • Journal of the Korea Computer Graphics Society
    • /
    • v.27 no.5
    • /
    • pp.33-43
    • /
    • 2021
  • In this paper, we propose a framework that transfers the information of style motions to content motions based on a variational autoencoder network combined with a style encoding in the latent space. Because we transfer a style to a content motion that is sampled from a variational autoencoder, we can increase the diversity of existing motion data. In addition, we can improve the unnatural motions caused by decoding a new latent variable from style transfer. That improvement was achieved by additionally using the velocity information of motions when generating next frames.

Application of Deep Learning: A Review for Firefighting

  • Shaikh, Muhammad Khalid
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.73-78
    • /
    • 2022
  • The aim of this paper is to investigate the prevalence of Deep Learning in the literature on Fire & Rescue Service. It is found that deep learning techniques are only beginning to benefit the firefighters. The popular areas where deep learning techniques are making an impact are situational awareness, decision making, mental stress, injuries, well-being of the firefighter such as his sudden fall, inability to move and breathlessness, path planning by the firefighters while getting to an fire scene, wayfinding, tracking firefighters, firefighter physical fitness, employment, prediction of firefighter intervention, firefighter operations such as object recognition in smoky areas, firefighter efficacy, smart firefighting using edge computing, firefighting in teams, and firefighter clothing and safety. The techniques that were found applied in firefighting were Deep learning, Traditional K-Means clustering with engineered time and frequency domain features, Convolutional autoencoders, Long Short-Term Memory (LSTM), Deep Neural Networks, Simulation, VR, ANN, Deep Q Learning, Deep learning based on conditional generative adversarial networks, Decision Trees, Kalman Filters, Computational models, Partial Least Squares, Logistic Regression, Random Forest, Edge computing, C5 Decision Tree, Restricted Boltzmann Machine, Reinforcement Learning, and Recurrent LSTM. The literature review is centered on Firefighters/firemen not involved in wildland fires. The focus was also not on the fire itself. It must also be noted that several deep learning techniques such as CNN were mostly used in fire behavior, fire imaging and identification as well. Those papers that deal with fire behavior were also not part of this literature review.

A many-objective evolutionary algorithm based on integrated strategy for skin cancer detection

  • Lan, Yang;Xie, Lijie;Cai, Xingjuan;Wang, Lifang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.80-96
    • /
    • 2022
  • Nowadays, artificial intelligence promotes the rapid development of skin cancer detection technology, and the federated skin cancer detection model (FSDM) and dual generative adversarial network model (DGANM) solves the fragmentation and privacy of data to a certain extent. To overcome the problem that the many-objective evolutionary algorithm (MaOEA) cannot guarantee the convergence and diversity of the population when solving the above models, a many-objective evolutionary algorithm based on integrated strategy (MaOEA-IS) is proposed. First, the idea of federated learning is introduced into population mutation, the new parents are generated through sub-populations employs different mating selection operators. Then, the distance between each solution to the ideal point (SID) and the Achievement Scalarizing Function (ASF) value of each solution are considered comprehensively for environment selection, meanwhile, the elimination mechanism is used to carry out the select offspring operation. Eventually, the FSDM and DGANM are solved through MaOEA-IS. The experimental results show that the MaOEA-IS has better convergence and diversity, and it has superior performance in solving the FSDM and DGANM. The proposed MaOEA-IS provides more reasonable solutions scheme for many scholars of skin cancer detection and promotes the progress of intelligent medicine.

Denoising Traditional Architectural Drawings with Image Generation and Supervised Learning (이미지 생성 및 지도학습을 통한 전통 건축 도면 노이즈 제거)

  • Choi, Nakkwan;Lee, Yongsik;Lee, Seungjae;Yang, Seungjoon
    • Journal of architectural history
    • /
    • v.31 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • Traditional wooden buildings deform over time and are vulnerable to fire or earthquakes. Therefore, traditional wooden buildings require continuous management and repair, and securing architectural drawings is essential for repair and restoration. Unlike modernized CAD drawings, traditional wooden building drawings scan and store hand-drawn drawings, and in this process, many noise is included due to damage to the drawing itself. These drawings are digitized, but their utilization is poor due to noise. Difficulties in systematic management of traditional wooden buildings are increasing. Noise removal by existing algorithms has limited drawings that can be applied according to noise characteristics and the performance is not uniform. This study presents deep artificial neural network based noised reduction for architectural drawings. Front/side elevation drawings, floor plans, detail drawings of Korean wooden treasure buildings were considered. First, the noise properties of the architectural drawings were learned with both a cycle generative model and heuristic image fusion methods. Consequently, a noise reduction network was trained through supervised learning using training sets prepared using the noise models. The proposed method provided effective removal of noise without deteriorating fine lines in the architectural drawings and it showed good performance for various noise types.

Development of hybrid precipitation nowcasting model by using conditional GAN-based model and WRF (GAN 및 물리과정 기반 모델 결합을 통한 Hybrid 강우예측모델 개발)

  • Suyeon Choi;Yeonjoo Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.100-100
    • /
    • 2023
  • 단기 강우 예측에는 주로 물리과정 기반 수치예보모델(NWPs, Numerical Prediction Models) 과 레이더 기반 확률론적 방법이 사용되어 왔으며, 최근에는 머신러닝을 이용한 레이더 기반 강우예측 모델이 단기 강우 예측에 뛰어난 성능을 보이는 것을 확인하여 관련 연구가 활발히 진행되고 있다. 하지만 머신러닝 기반 모델은 예측 선행시간 증가 시 성능이 크게 저하되며, 또한 대기의 물리적 과정을 고려하지 않는 Black-box 모델이라는 한계점이 존재한다. 본 연구에서는 이러한 한계를 극복하기 위해 머신러닝 기반 blending 기법을 통해 물리과정 기반 수치예보모델인 Weather Research and Forecasting (WRF)와 최신 머신러닝 기법 (cGAN, conditional Generative Adversarial Network) 기반 모델을 결합한 Hybrid 강우예측모델을 개발하고자 하였다. cGAN 기반 모델 개발을 위해 1시간 단위 1km 공간해상도의 레이더 반사도, WRF 모델로부터 산출된 기상 자료(온도, 풍속 등), 유역관련 정보(DEM, 토지피복 등)를 입력 자료로 사용하여 모델을 학습하였으며, 모델을 통해 물리 정보 및 머신러닝 기반 강우 예측을 생성하였다. 이렇게 생성된cGAN 기반 모델 결과와 WRF 예측 결과를 결합하는 머신러닝 기반 blending 기법을 통해Hybrid 강우예측 결과를 최종적으로 도출하였다. 본 연구에서는 Hybrid 강우예측 모델의 성능을 평가하기 위해 수도권 및 안동댐 유역에서 발생한 호우 사례를 기반으로 최대 선행시간 6시간까지 모델 예측 결과를 분석하였다. 이를 통해 물리과정 기반 모델과 머신러닝 기반 모델을 결합하는 Hybrid 기법을 적용하여 높은 정확도와 신뢰도를 가지는 고해상도 강수 예측 자료를 생성할 수 있음을 확인하였다.

  • PDF

StarGAN-Based Detection and Purification Studies to Defend against Adversarial Attacks (적대적 공격을 방어하기 위한 StarGAN 기반의 탐지 및 정화 연구)

  • Sungjune Park;Gwonsang Ryu;Daeseon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.3
    • /
    • pp.449-458
    • /
    • 2023
  • Artificial Intelligence is providing convenience in various fields using big data and deep learning technologies. However, deep learning technology is highly vulnerable to adversarial examples, which can cause misclassification of classification models. This study proposes a method to detect and purification various adversarial attacks using StarGAN. The proposed method trains a StarGAN model with added Categorical Entropy loss using adversarial examples generated by various attack methods to enable the Discriminator to detect adversarial examples and the Generator to purification them. Experimental results using the CIFAR-10 dataset showed an average detection performance of approximately 68.77%, an average purification performance of approximately 72.20%, and an average defense performance of approximately 93.11% derived from restoration and detection performance.

Image Translation of SDO/AIA Multi-Channel Solar UV Images into Another Single-Channel Image by Deep Learning

  • Lim, Daye;Moon, Yong-Jae;Park, Eunsu;Lee, Jin-Yi
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.42.3-42.3
    • /
    • 2019
  • We translate Solar Dynamics Observatory/Atmospheric Imaging Assembly (AIA) ultraviolet (UV) multi-channel images into another UV single-channel image using a deep learning algorithm based on conditional generative adversarial networks (cGANs). The base input channel, which has the highest correlation coefficient (CC) between UV channels of AIA, is 193 Å. To complement this channel, we choose two channels, 1600 and 304 Å, which represent upper photosphere and chromosphere, respectively. Input channels for three models are single (193 Å), dual (193+1600 Å), and triple (193+1600+304 Å), respectively. Quantitative comparisons are made for test data sets. Main results from this study are as follows. First, the single model successfully produce other coronal channel images but less successful for chromospheric channel (304 Å) and much less successful for two photospheric channels (1600 and 1700 Å). Second, the dual model shows a noticeable improvement of the CC between the model outputs and Ground truths for 1700 Å. Third, the triple model can generate all other channel images with relatively high CCs larger than 0.89. Our results show a possibility that if three channels from photosphere, chromosphere, and corona are selected, other multi-channel images could be generated by deep learning. We expect that this investigation will be a complementary tool to choose a few UV channels for future solar small and/or deep space missions.

  • PDF

KFREB: Korean Fictional Retrieval-based Evaluation Benchmark for Generative Large Language Models (KFREB: 생성형 한국어 대규모 언어 모델의 검색 기반 생성 평가 데이터셋)

  • Jungseob Lee;Junyoung Son;Taemin Lee;Chanjun Park;Myunghoon Kang;Jeongbae Park;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.9-13
    • /
    • 2023
  • 본 논문에서는 대규모 언어모델의 검색 기반 답변 생성능력을 평가하는 새로운 한국어 벤치마크, KFREB(Korean Fictional Retrieval Evaluation Benchmark)를 제안한다. KFREB는 모델이 사전학습 되지 않은 허구의 정보를 바탕으로 검색 기반 답변 생성 능력을 평가함으로써, 기존의 대규모 언어모델이 사전학습에서 보았던 사실을 반영하여 생성하는 답변이 실제 검색 기반 답변 시스템에서의 능력을 제대로 평가할 수 없다는 문제를 해결하고자 한다. 제안된 KFREB는 검색기반 대규모 언어모델의 실제 서비스 케이스를 고려하여 장문 문서, 두 개의 정답을 포함한 골드 문서, 한 개의 골드 문서와 유사 방해 문서 키워드 유무, 그리고 문서 간 상호 참조를 요구하는 상호참조 멀티홉 리즈닝 경우 등에 대한 평가 케이스를 제공하며, 이를 통해 대규모 언어모델의 적절한 선택과 실제 서비스 활용에 대한 인사이트를 제공할 수 있을 것이다.

  • PDF