Search | Korea Science

Design of a Deep Neural Network Model for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델의 설계)

Kim, Dongha;Kim, Incheol
- KIPS Transactions on Software and Data Engineering
- /
- v.6 no.4
- /
- pp.203-210
- /
- 2017
In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.
https://doi.org/10.3745/KTSDE.2017.6.4.203 인용 PDF KSCI

Photo-realistic Face Image Generation by DCGAN with error relearning (심층 적대적 생성 신경망의 오류 재학습을 이용한 얼굴 영상 생성 모델)

Ha, Yong-Wook;Hong, Dong-jin;Cha, Eui-Young
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2018.10a
- /
- pp.617-619
- /
- 2018
In this paper, We suggest a face image generating GAN model which is improved by an additive discriminator. This discriminator is trained to be specialized in preventing frequent mistake of generator. To verify the model suggested, we used $^*Inception$ score. We used 155,680 images of $^*celebA$ which is frontal face. We earned average 1.742p at Inception score and it is much better score compare to previous model.
PDF

Speech Recognition Accuracy Measure using Deep Neural Network for Effective Evaluation of Speech Recognition Performance (효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표)

Ji, Seung-eun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.12
- /
- pp.2291-2297
- /
- 2017
This paper describe to extract speech measure algorithm for evaluating a speech database, and presents generating method of a speech quality measure using DNN(Deep Neural Network). In our previous study, to produce an effective speech quality measure, we propose a combination of various speech measures which are highly correlated with WER(Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we describe the method of extracting measure using DNN, and we change one of the combined measure from GMM(Gaussican Mixture Model) score used in the previous study to DNN score. The combination with DNN score shows a higher correlation with WER compared to the combination with GMM score.
https://doi.org/10.6109/jkiice.2017.21.12.2291 인용 PDF KSCI

Motion Generation of a Single Rigid Body Character Using Deep Reinforcement Learning (심층 강화 학습을 활용한 단일 강체 캐릭터의 모션 생성)

Ahn, Jewon;Gu, Taehong;Kwon, Taesoo
- Journal of the Korea Computer Graphics Society
- /
- v.27 no.3
- /
- pp.13-23
- /
- 2021
In this paper, we proposed a framework that generates the trajectory of a single rigid body based on its COM configuration and contact pose. Because we use a smaller input dimension than when we use a full body state, we can improve the learning time for reinforcement learning. Even with a 68% reduction in learning time (approximately two hours), the character trained by our network is more robust to external perturbations tolerating an external force of 1500 N which is about 7.5 times larger than the maximum magnitude from a previous approach. For this framework, we use centroidal dynamics to calculate the next configuration of the COM, and use reinforcement learning for obtaining a policy that gives us parameters for controlling the contact positions and forces.
https://doi.org/10.15701/kcgs.2021.27.3.13 인용 PDF KSCI

Perceptual Video Coding using Deep Convolutional Neural Network based JND Model (심층 합성곱 신경망 기반 JND 모델을 이용한 인지 비디오 부호화)

Kim, Jongho;Lee, Dae Yeol;Cho, Seunghyun;Jeong, Seyoon;Choi, Jinsoo;Kim, Hui-Yong
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.06a
- /
- pp.213-216
- /
- 2018
본 논문에서는 사람의 인지 시각 특성 중 하나인 JND(Just Noticeable Difference)를 이용한 인지 비디오 부호화 기법을 제안한다. JND 기반 인지 부호화 방법은 사람의 인지 시각 특성을 이용해 시각적으로 인지가 잘 되지 않는 인지 신호를 제거함으로 부호화 효율을 높이는 방법이다. 제안된 방법은 기존 수학적 모델 기반의 JND 기법이 아닌 최근 각광 받고 있는 데이터 중심(data-driven) 모델링 방법인 심층 신경망 기반 JND 모델 생성 기법을 제안한다. 제안된 심층 신경망 기반 JND 모델은 비디오 부호화 과정에서 입력 영상에 대한 전처리를 통해 입력 영상의 인지 중복(perceptual redundancy)를 제거하는 역할을 수행한다. 부호화 실험에서 제안된 방법은 동일하거나 유사한 인지화질을 유지한 상태에서 평균 16.86 %의 부호화 비트를 감소 시켰다.
PDF

Self-Attention-based SMILES Generationfor De Novo Drug Design (신약 디자인을 위한 Self-Attention 기반의 SMILES 생성자)

PIAO, SHENGMIN;Choi, Jonghwan;Kim, Kyeonghun;Park, Sanghyun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2021.05a
- /
- pp.343-346
- /
- 2021
약물 디자인이란 단백질과 같은 생물학적 표적에 작용할 수 있는 새로운 약물을 개발하는 과정이다. 전통적인 방법은 탐색과 개발 단계로 구성되어 있으나, 하나의 신약 개발을 위해서는 10 년 이상의 장시간이 요구되기 때문에, 이러한 기간을 단축하기 위한 인공지능 기반의 약물 디자인 방법들이 개발되고 있다. 하지만 많은 심층학습 기반의 약물 디자인 모델들은 RNN 기법을 활용하고 있고, RNN 은 훈련속도가 느리다는 단점이 있기 때문에 개선의 여지가 남아있다. 이런 단점을 극복하기 위해 본 연구는 self-attention 과 variational autoencoder 를 활용한 SMILES 생성 모델을 제안한다. 제안된 모델은 최신 약물 디자인 모델 대비 훈련 시간을 1/36 단축하고, 뿐만 아니라 유효한 SMILES 를 더 많이 생성하는 것을 확인하였다.
https://doi.org/10.3745/PKIPS.y2021m05a.343 인용 PDF

딥러닝 기반 얼굴 위변조 검출 기술 동향

Kim, Won-Jun
- Broadcasting and Media Magazine
- /
- v.25 no.2
- /
- pp.52-60
- /
- 2020
최근 생체 정보를 이용한 사용자 인증 기술이 발전하면서 이를 모바일 기기에 적용하는 사례가 크게 증가하고 있다. 특히, 얼굴 기반 인증 방식은 비접촉식이며 사용이 편리하여 적용 범위가 점점 확대되고 있는 추세이다. 그러나, 사용자의 얼굴 사진이나 동영상 등을 이용한 위변조가 용이하기 때문에 모바일 기기 내 보안 유지에 어려움을 야기한다. 본 고에서는 이러한 문제를 해결하기 위해 최근 활발히 연구되고 있는 심층신경망 기반 얼굴 위변조 검출 연구의 최신 동향을 소개하고자 한다. 먼저, 기본 합성곱 신경망 구조부터 생성모델 기반의 위변조 검출 방법까지 다양한 신경망 구조를 이용한 위변조 검출 방법에 대해 설명한다. 또한, 심층신경망 학습을 위해 사용되는 얼굴 위변조 데이터셋에 대해서도 간략히 살펴보고자 한다.
PDF KSCI

Semantic Feature Learning and Selective Attention for Video Captioning (비디오 캡션 생성을 위한 의미 특징 학습과 선택적 주의집중)

Lee, Sujin;Kim, Incheol
- Proceedings of the Korea Information Processing Society Conference
- /
- 2017.11a
- /
- pp.865-868
- /
- 2017
일반적으로 비디오로부터 캡션을 생성하는 작업은 입력 비디오로부터 특징을 추출해내는 과정과 추출한 특징을 이용하여 캡션을 생성해내는 과정을 포함한다. 본 논문에서는 효과적인 비디오 캡션 생성을 위한 심층 신경망 모델과 그 학습 방법을 소개한다. 본 논문에서는 입력 비디오를 표현하는 시각 특징 외에, 비디오를 효과적으로 표현하는 동적 의미 특징과 정적 의미 특징을 입력 특징으로 이용한다. 본 논문에서 입력 비디오의 시각 특징들은 C3D, ResNet과 같은 합성곱 신경망을 이용하여 추출하지만, 의미 특징은 본 논문에서 제안하는 의미 특징 추출 네트워크를 활용하여 추출한다. 그리고 이러한 특징들을 기반으로 비디오 캡션을 효과적으로 생성하기 위하여 선택적 주의집중 캡션 생성 네트워크를 제안한다. Youtube 동영상으로부터 수집된 MSVD 데이터 집합을 이용한 다양한 실험을 통해, 본 논문에서 제안한 모델의 성능과 효과를 확인할 수 있었다.
https://doi.org/10.3745/PKIPS.y2017m11a.865 인용 PDF

A Deep Reinforcement Learning Framework for Optimal Path Planning of Industrial Robotic Arm (산업용 로봇 팔 최적 경로 계획을 위한 심층강화학습 프레임워크)

Kwon, Junhyung;Cho, Deun-Sol;Kim, Won-Tae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.11a
- /
- pp.75-76
- /
- 2022
현재 산업용 로봇 팔의 경로 계획을 생성할 때, 로봇 팔 경로 계획은 로봇 엔지니어가 수동으로 로봇을 제어하며 최적 경로 계획을 탐색한다. 미래에 고객의 다양한 요구에 따라 공정을 유연하게 변경하는 대량 맞춤 시대에는 기존의 경로 계획 수립 방식은 부적합하다. 심층강화학습 프레임워크는 가상 환경에서 로봇 팔 경로 계획 수립을 학습해 새로운 공정으로 변경될 때, 최적 경로 계획을 자동으로 수립해 로봇 팔에 전달하여 빠르고 유연한 공정 변경을 지원한다. 본 논문에서는 심층강화학습 에이전트를 위한 학습 환경 구축과 인공지능 모델과 학습 환경의 연동을 중심으로, 로봇 팔 경로 계획 수립을 위한 심층강화학습 프레임워크 구조를 설계한다.
https://doi.org/10.3745/PKIPS.y2022m11a.75 인용 PDF

Detecting Visual Attributes and Spatial Relationships with Deep Neural Networks (심층 신경망을 이용한 영상 기반 물체 속성 및 공간 관계 탐지)

Lee, Jae-Yun;Lee, Gi-Ho;Kim, In-Cheol
- Proceedings of the Korea Information Processing Society Conference
- /
- 2018.05a
- /
- pp.424-427
- /
- 2018
영상이나 비디오에 담긴 장면을 이해하는 것은 컴퓨터 비전의 궁극적인 목표 중 하나이다. 본 논문에서는 입력 영상으로부터 장면을 구성하는 각 물체들과 그들 간의 공간 관계, 개별 물체들의 다양한 속성들을 탐지해, 지식 그래프를 생성해주는 심층 신경망 기반의 물체 속성 및 공간 관계 탐지 모델을 제안한다. 본 논문에서는 이러한 다양한 복합 시각 인식 작업을 동시에 수행하는 탐지 모델의 구성에 대해 설명하고, 대규모 벤치마크 데이터 집합인 CLEVR을 이용한 탐지 모델의 성능 분석 실험 결과를 소개한다.
https://doi.org/10.3745/PKIPS.y2018m05a.424 인용 PDF

Search Result 106, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)