• Title/Summary/Keyword: Image-text generation

Search Result 64, Processing Time 0.022 seconds

A Case Study of Object detection via Generated image Using deep learning model based on image generation (딥 러닝 기반 이미지 생성 모델을 활용한 객체 인식 사례 연구)

  • Dabin Kang;Jisoo Hong;Jaehong Kim;Minji Song;Dong-hwi Kim;Sang-hyo Park
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.203-206
    • /
    • 2022
  • 본 논문에서는 생성된 이미지에 대한 YOLO 모델의 객체 인식의 성능을 확인하고 사례를 연구하는 것을 목적으로 한다. 최근 영상 처리 기술이 발전함에 따라 적대적 공격의 위험성이 증가하고, 이로 인해 객체 인식의 성능이 현저히 떨어질 수 있는 문제가 발생하고 있다. 본 연구에서는 앞서 언급한 문제를 해결하기 위해 text-to-image 모델을 활용하여 기존에 존재하지 않는 새로운 이미지를 생성하고, 생성된 이미지에 대한 객체 인식을 사례 별로 연구한다. 총 8가지의 동물 카테고리로 분류한 후 객체 인식 성능을 확인한 결과 86.46%의 정확도로 바운딩 박스를 생성하였고, 동물에 대한 116개의 60.41%의 정확도를 보여주었다.

  • PDF

A Study on Character Consistency Generated in [Midjourney V6] Technology

  • Xi Chen;Jeanhun Chung
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.142-147
    • /
    • 2024
  • The emergence of programs like Midjourney, particularly known for its text-to-image capability, has significantly impacted design and creative industries. Midjourney continually updates its database and algorithms to enhance user experience, with a focus on character consistency. This paper's examination of the latest V6 version of Midjourney reveals notable advancements in its characteristics and design principles, especially in the realm of character generation. By comparing V6 with its predecessors, this study underscores the significant strides made in ensuring consistent character portrayal across different plots and timelines.Such improvements in AI-driven character consistency are pivotal for storytelling. They ensure coherent and reliable character representation, which is essential for narrative clarity, emotional resonance, and overall effectiveness. This coherence supports a more immersive and engaging storytelling experience, fostering deeper audience connection and enhancing creative expression.The findings of this study encourage further exploration of Midjourney's capabilities for artistic innovation. By leveraging its advanced character consistency, creators can push the boundaries of storytelling, leading to new and exciting developments in the fusion of technology and art.

AEMSER Using Adaptive Threshold Of Canny Operator To Extract Scene Text (장면 텍스트 추출을 위한 캐니 연산자의 적응적 임계값을 이용한 AEMSER)

  • Park, Sunhwa;Kim, Donghyun;Im, Hyunsoo;Kim, Honghoon;Paek, Jaegyung;Park, Jaeheung;Seo, Yeong Geon
    • Journal of Digital Contents Society
    • /
    • v.16 no.6
    • /
    • pp.951-959
    • /
    • 2015
  • Scene text extraction is important because it offers some important information on different image based applications pouring in current smart generation. Edge-Enhanced MSER(Maximally Stable Extremal Regions) which enhances the boundaries using the canny operator after extracting the basic MSER shows excellent performance in terms of text extraction. But according to setting the threshold of the canny operator, the result images using Edge-Enhanced MSER are different, so there needs a method figuring out the threshold. In this paper, we propose a AEMSER(Adaptive Edge-enhanced MSER) that applies the method extracting the boundary using the middle value of histogram to Edge-Enhanced MSER to get the canny operator's threshold. The proposed method can acquire better result images than the existing methods because it extracts the area only for the obvious boundaries.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

Evaluation of Sentimental Texts Automatically Generated by a Generative Adversarial Network (생성적 적대 네트워크로 자동 생성한 감성 텍스트의 성능 평가)

  • Park, Cheon-Young;Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.6
    • /
    • pp.257-264
    • /
    • 2019
  • Recently, deep neural network based approaches have shown a good performance for various fields of natural language processing. A huge amount of training data is essential for building a deep neural network model. However, collecting a large size of training data is a costly and time-consuming job. A data augmentation is one of the solutions to this problem. The data augmentation of text data is more difficult than that of image data because texts consist of tokens with discrete values. Generative adversarial networks (GANs) are widely used for image generation. In this work, we generate sentimental texts by using one of the GANs, CS-GAN model that has a discriminator as well as a classifier. We evaluate the usefulness of generated sentimental texts according to various measurements. CS-GAN model not only can generate texts with more diversity but also can improve the performance of its classifier.

Automated Story Generation with Image Captions and Recursiva Calls (이미지 캡션 및 재귀호출을 통한 스토리 생성 방법)

  • Isle Jeon;Dongha Jo;Mikyeong Moon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.1
    • /
    • pp.42-50
    • /
    • 2023
  • The development of technology has achieved digital innovation throughout the media industry, including production techniques and editing technologies, and has brought diversity in the form of consumer viewing through the OTT service and streaming era. The convergence of big data and deep learning networks automatically generated text in format such as news articles, novels, and scripts, but there were insufficient studies that reflected the author's intention and generated story with contextually smooth. In this paper, we describe the flow of pictures in the storyboard with image caption generation techniques, and the automatic generation of story-tailored scenarios through language models. Image caption using CNN and Attention Mechanism, we generate sentences describing pictures on the storyboard, and input the generated sentences into the artificial intelligence natural language processing model KoGPT-2 in order to automatically generate scenarios that meet the planning intention. Through this paper, the author's intention and story customized scenarios are created in large quantities to alleviate the pain of content creation, and artificial intelligence participates in the overall process of digital content production to activate media intelligence.

The Study on Lossy and Lossless Compression of Binary Hangul Textual Images by Pattern Matching (패턴매칭에 의한 이진 한글문서의 유.무손실 압축에 관한 연구)

  • 김영태;고형화
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.4
    • /
    • pp.726-736
    • /
    • 1997
  • The textual image compression by pattern matching is a coding scheme that exploits the correlations between patterns. When we compress the Hangul (Korean character) text by patern matching, the collerations between patterns may decrease due to randoem contacts between phonemes. Therefore in this paper we separate connected phonemes to exploit effectively the corrlation between patterns by inducting the amtch. In the process of sequation, we decide whether the patterns have vowel component or not, and then vowels connected with consonant ae separated. When we compare the proposed algorithm with the existing algorith, the compression ratio is increased by 1.3%-3.0% than PMS[5] in lossy mode, by 3.4%-9.1% in lossless mode than that of SPM[7] which is submitted to standard committe for second generation binary compression algorithm.

  • PDF

Multimedia Messaging Service Adaptation for the Mobile Learning System Based on CC/PP

  • Kim, Su-Do;Park, Man-Gon
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.883-890
    • /
    • 2008
  • It becomes enabled to provide variety of multimedia contents through mobile service with the development of high-speed 3rd generation mobile communication and handsets. MMS (Multimedia Messaging Service) can be displayed in the presentation format which is unified the various multimedia contents such as text, audio, image, video, etc. It is applicable as a new type of ubiquitous learning. In this study we propose to design a mobile learning system by providing profiles which meets the standard of CC/PP and by generating multimedia messages based on SMIL language through the adaptation steps according to the learning environment, the content type, and the device property of learners.

  • PDF

Image Generation based on Text and Sketch with Generative Adversarial Networks (생성적 적대 네트워크를 활용한 텍스트와 스케치 기반 이미지 생성 기법)

  • Lee, Je-Hoon;Lee, Dong-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.293-296
    • /
    • 2018
  • 생성적 적대 네트워크를 활용하여 텍스트, 스케치 등 다양한 자원으로부터 이미지를 생성하기 위한 연구는 활발하게 진행되고 있으며 많은 실용적인 연구가 존재한다. 하지만 기존 연구들은 텍스트나 스케치 등 각 하나의 자원을 통해 이미지를 생성하기 때문에 설명이 부족한 텍스트, 실제 이미지와 상이한 스케치와 같이 자원의 정보가 불완전한 경우에는 제대로 된 이미지를 생성하지 못한다는 한계가 있다. 본 논문에서는 기존 연구의 한계점올 극복하기 위해 텍스트와 스케치 두 개의 자원을 동시에 활용하여 이미지를 생성하는 새로운 생성 기법 TS-GAN 을 제안한다. TS-GAN 은 두 단계로 이루어져 있으며 각 단계를 통해 더욱 사실적인 이미지를 생성한다. 본 논문에서 제안한 기법은 컴퓨터 비전 분야에서 많이 활용되는 CUB 데이터세트를 사용하여 이미지 생성 결과의 우수성을 보인다.

Image Generation from Korean Dialogue Text via Prompt-based Few-shot Learning (프롬프트 기반 퓨샷 러닝을 통한 한국어 대화형 텍스트 기반 이미지 생성)

  • Eunchan Lee;Sangtae Ahn
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.447-451
    • /
    • 2022
  • 본 논문에서는 사용자가 대화 텍스트 방식의 입력을 주었을 때 이를 키워드 중심으로 변환하여 이미지를 생성해내는 방식을 제안한다. 대화 텍스트란 채팅 등에서 주로 사용하는 형식의 구어체를 말하며 이러한 텍스트 형식은 텍스트 기반 이미지 생성 모델이 적절한 아웃풋 이미지를 생성하기 어렵게 만든다. 이를 해결하기 위해 대화 텍스트를 키워드 중심 텍스트로 바꾸어 텍스트 기반 이미지 생성 모델의 입력으로 변환하는 과정이 이미지 생성의 질을 높이는 좋은 방안이 될 수 있는데 이러한 태스크에 적합한 학습 데이터는 충분하지 않다. 본 논문에서는 이러한 문제를 다루기 위한 하나의 방안으로 사전학습된 초대형 언어모델인 KoGPT 모델을 활용하며, 퓨샷 러닝을 통해 적은 양의 직접 제작한 데이터만을 학습시켜 대화 텍스트 기반의 이미지 생성을 구현하는 방법을 제안한다.

  • PDF