Search | Korea Science

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
- Journal of Service Research and Studies
- /
- v.14 no.1
- /
- pp.13-26
- /
- 2024
Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.
https://doi.org/10.18807/jsrs.2024.14.1.013 인용 PDF

A Study on the Method of Creating Realistic Content in Audience-participating Performances using Artificial Intelligence Sentiment Analysis Technology (인공지능 감정분석 기술을 이용한 관객 참여형 공연에서의 실감형 콘텐츠 생성 방식에 관한 연구)

Kim, Jihee;Oh, Jinhee;Kim, Myeungjin;Lim, Yangkyu
- Journal of Broadcast Engineering
- /
- v.26 no.5
- /
- pp.533-542
- /
- 2021
In this study, a process of re-creating Jindo Buk Chum, one of the traditional Korean arts, into digital art using various artificial intelligence technologies was proposed. The audience's emotional data, quantified through artificial intelligence language analysis technology, intervenes in various object forms in the projection mapping performance and affects the big story without changing it. If most interactive arts express communication between the performer and the video, this performance becomes a new type of responsive performance that allows the audience to directly communicate with the work, centering on artificial intelligence emotion analysis technology. This starts with 'Chuimsae', a performance that is common only in Korean traditional art, where the audience directly or indirectly intervenes and influences the performance. Based on the emotional information contained in the performer's 'prologue', it is combined with the audience's emotional information and converted into the form of images and particles used in the performance to indirectly participate and change the performance.
https://doi.org/10.5909/JBE.2021.26.5.533 인용 PDF KSCI KPUBS

An Intelligent Chatbot Utilizing BERT Model and Knowledge Graph (BERT 모델과 지식 그래프를 활용한 지능형 챗봇)

Yoo, SoYeop;Jeong, OkRan
- The Journal of Society for e-Business Studies
- /
- v.24 no.3
- /
- pp.87-98
- /
- 2019
As artificial intelligence is actively studied, it is being applied to various fields such as image, video and natural language processing. The natural language processing, in particular, is being studied to enable computers to understand the languages spoken and spoken by people and is considered one of the most important areas in artificial intelligence technology. In natural language processing, it is a complex, but important to make computers learn to understand a person's common sense and generate results based on the person's common sense. Knowledge graphs, which are linked using the relationship of words, have the advantage of being able to learn common sense easily from computers. However, the existing knowledge graphs are organized only by focusing on specific languages and fields and have limitations that cannot respond to neologisms. In this paper, we propose an intelligent chatbotsystem that collects and analyzed data in real time to build an automatically scalable knowledge graph and utilizes it as the base data. In particular, the fine-tuned BERT-based for relation extraction is to be applied to auto-growing graph to improve performance. And, we have developed a chatbot that can learn human common sense using auto-growing knowledge graph, it verifies the availability and performance of the knowledge graph.
https://doi.org/10.7838/jsebs.2019.24.3.087 인용 PDF KSCI

Protocol Classification Based on Traffic Flow and Deep Learning (트래픽 플로우 및 딥러닝 기반의 프로토콜 분류 방법론)

Ye-Jin Park;Yeong-Pil Cho
- Proceedings of the Korea Information Processing Society Conference
- /
- 2024.05a
- /
- pp.836-838
- /
- 2024
본 논문은 현대 사회에서 급증하는 VPN의 악용 가능성을 인지하고 VPN과 Non-VPN 트래픽 구별의 중요도를 강조한다. 전통적인 포트 기반 분류와 패킷 분석 접근법의 한계를 넘어서기 위해 트래픽 플로우 특징과 인공지능(AI) 기술을 결합하여 VPN과 Non-VPN 프로토콜을 구별하는 새로운 방법을 제안한다. 직접 수집한 패킷 데이터셋을 사용하여 트래픽 플로우 특징을 추출하고, 패킷의 페이로드와 결합해 이미지를 생성한다. 이를 CNN 모델에 적용함으로써 높은 정확도로 프로토콜을 구별한다. 실험 결과, 제안된 방법은 99.71%의 높은 정확도를 달성하여 트래픽 분류 및 네트워크 보안 강화에 기여할 수 있는 방법론임을 입증한다.
https://doi.org/10.3745/PKIPS.y2024m05a.836 인용 PDF

Basic Study on the Generation of Maritime Traffic Information (해상교통정보 생성에 관한 기초 연구)

Kim, Hye-jin;Oh, Jaeyong;Park, sekil
- Proceedings of the Korean Institute of Navigation and Port Research Conference
- /
- 2016.05a
- /
- pp.287-288
- /
- 2016
선박과 선박간의 사고 위험도를 예측하는 교통정보 생성 기술을 해상교통관제센터에 적용하기에는 위험도 정보의 정확성에 한계가 있다. 또한 대상 해역에 대한 교통 패턴을 파악하는 밀집도 및 혼잡도와 같은 교통정보 생성 기술은 위험 우선순위 선박을 도출하는 것이 불가능하다. 복잡한 교통 패턴을 보이는 해상교통관제 해역에서 위험 선박을 인지하여 관제사의 관제 업무를 지원하기 위해서는 새로운 접근이 필요하다. 본 연구에서는 관제대상해역의 교통 상황을 총체적으로 파악하고 위험 선박을 사전에 인지할 수 있는 교통정보 생성을 위해서 기계학습 기법을 검토하였으며, 기존의 인공지능 한계를 극복하기 위한 딥러닝 프레임워크 도입을 검토하였다. 해상교통관제센터의 이미지, 메시지, 음성 등 다양한 형태의 연속적 자료들을 통합하고 이를 토대로 총체적인 분석을 통해 관제 업무를 지원할 수 있는 교통 상황 인지 정보를 생성할 수 있을 것으로 파악되었다. 빅데이터 기반의 기계학습은 보다 의미 있는 상황 인지 정보를 생성할 수 있기 때문에 이를 위한 관제 센터의 각종 데이터 통합이 필요하다.
PDF

Efficient Collecting Scheme the Crack Data via Vector based Data Augmentation and Style Transfer with Artificial Neural Networks (벡터 기반 데이터 증강과 인공신경망 기반 특징 전달을 이용한 효율적인 균열 데이터 수집 기법)

Yun, Ju-Young;Kim, Donghui;Kim, Jong-Hyun
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2021.07a
- /
- pp.667-669
- /
- 2021
본 논문에서는 벡터 기반 데이터 증강 기법(Data augmentation)을 제안하여 학습 데이터를 구축한 뒤, 이를 합성곱 신경망(Convolutional Neural Networks, CNN)으로 실제 균열과 가까운 패턴을 표현할 수 있는 프레임워크를 제안한다. 건축물의 균열은 인명 피해를 가져오는 건물 붕괴와 낙하 사고를 비롯한 큰 사고의 원인이다. 이를 인공지능으로 해결하기 위해서는 대량의 데이터 확보가 필수적이다. 하지만, 실제 균열 이미지는 복잡한 패턴을 가지고 있을 뿐만 아니라, 위험한 상황에 노출되기 때문에 대량의 데이터를 확보하기 어렵다. 이러한 데이터베이스 구축의 문제점은 인위적으로 특정 부분에 변형을 주어 데이터양을 늘리는 탄성왜곡(Elastic distortion) 기법으로 해결할 수 있지만, 본 논문에서는 이보다 향상된 균열 패턴 결과를 CNN을 활용하여 보여준다. 탄성왜곡 기법보다 CNN을 이용했을 때, 실제 균열 패턴과 유사하게 추출된 결과를 얻을 수 있었고, 일반적으로 사용되는 픽셀 기반 데이터가 아닌 벡터 기반으로 데이터 증강을 설계함으로써 균열의 변화량 측면에서 우수함을 보였다. 본 논문에서는 적은 개수의 균열 데이터를 입력으로 사용했음에도 불구하고 균열의 방향 및 패턴을 다양하게 생성하여 쉽게 균열 데이터베이스를 구축할 수 있었다. 이는 장기적으로 구조물의 안정성 평가에 이바지하여 안전사고에 대한 불안감에서 벗어나 더욱 안전하고 쾌적한 주거 환경을 조성할 것으로 기대된다.
PDF

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

Kim, Taejin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.26 no.2
- /
- pp.79-104
- /
- 2020
Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.
https://doi.org/10.13088/jiis.2020.26.2.079 인용 PDF KSCI

A New Image Processing Scheme For Face Swapping Using CycleGAN (순환 적대적 생성 신경망을 이용한 안면 교체를 위한 새로운 이미지 처리 기법)

Ban, Tae-Won
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.9
- /
- pp.1305-1311
- /
- 2022
With the recent rapid development of mobile terminals and personal computers and the advent of neural network technology, real-time face swapping using images has become possible. In particular, the cycle generative adversarial network made it possible to replace faces using uncorrelated image data. In this paper, we propose an input data processing scheme that can improve the quality of face swapping with less training data and time. The proposed scheme can improve the image quality while preserving facial structure and expression information by combining facial landmarks extracted through a pre-trained neural network with major information that affects the structure and expression of the face. Using the blind/referenceless image spatial quality evaluator (BRISQUE) score, which is one of the AI-based non-reference quality metrics, we quantitatively analyze the performance of the proposed scheme and compare it to the conventional schemes. According to the numerical results, the proposed scheme obtained BRISQUE scores improved by about 4.6% to 14.6%, compared to the conventional schemes.
https://doi.org/10.6109/jkiice.2022.26.9.1305 인용 PDF KSCI

Controllable data augmentation framework based on multiple large-scale language models (복수 대규모 언어 모델에 기반한 제어 가능형 데이터 증강 프레임워크)

Hyeonseok Kang;Hyuk Namgoong;Jeesu Jung;Sangkeun Jung
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.3-8
- /
- 2023
데이터 증강은 인공지능 모델의 학습에서 필요한 데이터의 양이 적거나 편향되어 있는 경우, 이를 보완하여 모델의 성능을 높이는 데 도움이 된다. 이미지와는 달리 자연어의 데이터 증강은 문맥이나 문법적 구조와 같은 특징을 고려해야 하기 때문에, 데이터 증강에 많은 인적자원이 소비된다. 본 연구에서는 복수의 대규모 언어 모델을 사용하여 입력 문장과 제어 조건으로 프롬프트를 구성하는 데 최소한의 인적 자원을 활용한 의미적으로 유사한 문장을 생성하는 방법을 제안한다. 또한, 대규모 언어 모델을 단독으로 사용하는 것만이 아닌 병렬 및 순차적 구조로 구성하여 데이터 증강의 효과를 높이는 방법을 제안한다. 대규모 언어 모델로 생성된 데이터의 유효성을 검증하기 위해 동일한 개수의 원본 훈련 데이터와 증강된 데이터를 한국어 모델인 KcBERT로 다중 클래스 분류를 수행하였을 때의 성능을 비교하였다. 다중 대규모 언어 모델을 사용하여 데이터 증강을 수행하였을 때, 모델의 구조와 관계없이 증강된 데이터는 원본 데이터만을 사용하였을 때보다 높거나 그에 준하는 정확도를 보였다. 병렬 구조의 다중 대규모 언어 모델을 사용하여 400개의 원본 데이터를 증강하였을 때에는, 원본 데이터의 최고 성능인 0.997과 0.017의 성능 차이를 보이며 거의 유사한 학습 효과를 낼 수 있음을 보였다.
PDF

Generation of Stage Tour Contents with Deep Learning Style Transfer (딥러닝 스타일 전이 기반의 무대 탐방 콘텐츠 생성 기법)

Kim, Dong-Min;Kim, Hyeon-Sik;Bong, Dae-Hyeon;Choi, Jong-Yun;Jeong, Jin-Woo
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.24 no.11
- /
- pp.1403-1410
- /
- 2020
Recently, as interest in non-face-to-face experiences and services increases, the demand for web video contents that can be easily consumed using mobile devices such as smartphones or tablets is rapidly increasing. To cope with these requirements, in this paper we propose a technique to efficiently produce video contents that can provide experience of visiting famous places (i.e., stage tour) in animation or movies. To this end, an image dataset was established by collecting images of stage areas using Google Maps and Google Street View APIs. Afterwards, a deep learning-based style transfer method to apply the unique style of animation videos to the collected street view images and generate the video contents from the style-transferred images was presented. Finally, we showed that the proposed method could produce more interesting stage-tour video contents through various experiments.
https://doi.org/10.6109/jkiice.2020.24.11.1403 인용 PDF KSCI

Search Result 79, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)