• Title/Summary/Keyword: Image-text generation

Search Result 67, Processing Time 0.023 seconds

Text-to-Face Generation Using Multi-Scale Gradients Conditional Generative Adversarial Networks (다중 스케일 그라디언트 조건부 적대적 생성 신경망을 활용한 문장 기반 영상 생성 기법)

  • Bui, Nguyen P.;Le, Duc-Tai;Choo, Hyunseung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.764-767
    • /
    • 2021
  • While Generative Adversarial Networks (GANs) have seen huge success in image synthesis tasks, synthesizing high-quality images from text descriptions is a challenging problem in computer vision. This paper proposes a method named Text-to-Face Generation Using Multi-Scale Gradients for Conditional Generative Adversarial Networks (T2F-MSGGANs) that combines GANs and a natural language processing model to create human faces has features found in the input text. The proposed method addresses two problems of GANs: model collapse and training instability by investigating how gradients at multiple scales can be used to generate high-resolution images. We show that T2F-MSGGANs converge stably and generate good-quality images.

Implementation of DSP Embedded Number-Braille Conversion Algorithm based on Image Processing (DSP 임베디드 숫자-점자 변환 영상처리 알고리즘의 구현)

  • Chae, Jin-Young;Darshana, Panamulle Arachchige Udara;Kim, Won-Ho
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.2
    • /
    • pp.14-17
    • /
    • 2016
  • This paper describes the implementation of automatic number-braille converter based on image processing for the blind people. The algorithm is consists of four main steps. First step is binary image conversion of the input image obtained by the camera. the second step is segmentation operation by means of dilation and labelling of the character. Next step is calculation of cross-correlation between segmented text image and pre-defined text-pattern image. The final step is generation of brail output which is relevant to input image. The computer simulation result was showing 91.8% correct conversion rate for arabian numbers which is printed in A4-sheet and practical possibility was also confirmed by using implemented automatic number-braille converter based on DSP image processing board.

Mobile Phone Camera Based Scene Text Detection Using Edge and Color Quantization (에지 및 컬러 양자화를 이용한 모바일 폰 카메라 기반장면 텍스트 검출)

  • Park, Jong-Cheon;Lee, Keun-Wang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.3
    • /
    • pp.847-852
    • /
    • 2010
  • Text in natural images has a various and important feature of image. Therefore, to detect text and extraction of text, recognizing it is a studied as an important research area. Lately, many applications of various fields is being developed based on mobile phone camera technology. Detecting edge component form gray-scale image and detect an boundary of text regions by local standard deviation and get an connected components using Euclidean distance of RGB color space. Labeling the detected edges and connected component and get bounding boxes each regions. Candidate of text achieved with heuristic rule of text. Detected candidate text regions was merged for generation for one candidate text region, then text region detected with verifying candidate text region using ectilarity characterization of adjacency and ectilarity between candidate text regions. Experctental results, We improved text region detection rate using completentary of edge and color connected component.

Comparison of the Differences in AI-Generated Images Using Midjourney and Stable Diffusion (Midjourney와 Stable Diffusion을 이용한 AI 생성 이미지의 차이 비교)

  • Linh Bui Duong Hoai;Kang-Hee Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.563-564
    • /
    • 2023
  • Midjourney and Stable Diffusion are two popular AI-generated image programs nowadays. With AI's outstanding image-generation capabilities, everyone can create artistic paintings in just a few minutes. Therefore, "Comparison of differences between AI-generated images using Midjourney and Stable Diffusion" will help see each program's advantages and assist the users in identifying the tool suitable for their needs.

  • PDF

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

A Study on the Semiotic Application about the Image Vestmental (의상 이미지의 응용 기호론적 연구(I)-엘자 스키아파렐리의 3가지 의상 이미지에 관하여-)

  • 최인순
    • Journal of the Korean Society of Costume
    • /
    • v.38
    • /
    • pp.101-122
    • /
    • 1998
  • The purpose of this study is to define the fundamentals of one symbolic concept, so calles vestment-sign, based on the logical relationship of sign system about the trichotomy by charles S. Peice's sign concept for the communication system of meaning in the non-linguistic image domain. To prove the argument of vestment-sign, I selected 3 type of vestment language by styliste, Elsa Schiaparel-li. The third image vestmental chosen here, titled“Larme-Illusion(1938)”,printed by Salvad-or Dali will produce one symbolic proposition as a logical result which is generated and developed through the interpretation of other images. First of all the text, which is manifested by Elsa Schiaparelli's first image vestmental, tit-led“Notation Musical(1937)”and is symbolized as one category in the representation of the form, is regarded symbolic and metaphorical from a standpoint that the title and the meaning is connected to the form. The second image vestment, titled“Ruches Noirs(1938)”represents externally splendid feminity man-ifested by the symbolic and metaphorical expression. And the purity of sensitivity aiming to humanity in the detail of the poetic feeling of naturalism makes us imagine the battle fild of furious sensitivity. Like as the result of the battle, the third image stimulated our eyesight with the“absence”of dressing function. The proposition of the text,《Death》which the third image delivers, constructs sign system to bring up a meaning with the disappearance of physical“signifier”. This establishment of the symbolic concept presents the etymological authority of symbol generation called“Design”.

  • PDF

Character Region Detection in Natural Image Using Edge and Connected Component by Morphological Reconstruction (에지 및 형태학적 재구성에 의한 연결요소를 이용한 자연영상의 문자영역 검출)

  • Gwon, Gyo-Hyeon;Park, Jong-Cheon;Jun, Byoung-Min
    • Journal of Korea Entertainment Industry Association
    • /
    • v.5 no.1
    • /
    • pp.127-133
    • /
    • 2011
  • Characters in natural image are an important information with various context. Previous work of character region detection algorithms is not detect of character region in case of image complexity and the surrounding lighting, similar background to character, so this paper propose an method of character region detection in natural image using edge and connected component by morphological reconstructions. Firstly, we detect edge using Canny-edge detector and connected component with local min/max value by morphological reconstructed-operation in gray-scale image, and labeling each of detected connected component elements. lastly, detected candidate of text regions was merged for generation for one candidate text region, Final text region detected by checking the similarity and adjacency of neighbor of text candidate individual character. As the results of experiments, proposed algorithm improved the correctness of character regions detection using edge and connected components.

Variational Auto Encoder Distributed Restrictions for Image Generation (이미지 생성을 위한 변동 자동 인코더 분산 제약)

  • Yong-Gil Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.91-97
    • /
    • 2023
  • Recent research shows that latent directions can be used to image process towards certain attributes. However, controlling the generation process of generative model is very difficult. Though the latent directions are used to image process for certain attributes, many restrictions are required to enhance the attributes received the latent vectors according to certain text and prompts and other attributes largely unaffected. This study presents a generative model having certain restriction to the latent vectors for image generation and manipulation. The suggested method requires only few minutes per manipulation, and the simulation results through Tensorflow Variational Auto-encoder show the effectiveness of the suggested approach with extensive results.

An Analysis of the Social-Cultural Meaning of Korean Girl Groups' Appearances -Focusing on the Change of Girl Groups' Appearances across Generations- (국내 걸그룹 외모에 나타난 사회문화적 의미 분석 - 세대별 걸그룹 외모 변화를 중심으로 -)

  • Han, Cha-young
    • Journal of Fashion Business
    • /
    • v.21 no.1
    • /
    • pp.12-31
    • /
    • 2017
  • Korean commercial-organized girl groups were remarkable in the late 1990's. However, by the late 2000's, girl groups had an even more profound effect on Korean popular music compare to past influences. This study aimed to analyze the social-cultural meaning of the changing appearance of girl group between the first and second-generations. For this purpose, this study analyzed media image and text, based on a social-cultural context, about 13 girl groups. The results are as follows. First, while the first -generation girl group tended to maintain girlish/sexy images trying to the male desire, the second -generation girl group strategically showed various sexual identities such as femininity, masculinity, masculinity and androgyny along with contextual sexual images. The reason why girl groups increased the number of strategic images featuring various sexual identities was in order to appeal to a wide, diverse audience. Second, the second generation girl groups had - slim bodies with great athleticism, basically due to trainee system. Because of this, their semiotic body images have been commercially used to promote the consumption. Third, the second generation girl groups - were the bigger stars than first generation girl groups - because the members worked in many different fields. Therefore, the group members' images were successful consumed directly and then reproduced symbolically. Fourth, each member of the second -generation girl groups characterized by appearing in diverse, yet familiar images, through various media sources. Although the intention of this was to have recognition and popularity, it became difficult for them to change their image once one particular image was deemed popular.

Agricultural Applicability of AI based Image Generation (AI 기반 이미지 생성 기술의 농업 적용 가능성)

  • Seungri Yoon;Yeyeong Lee;Eunkyu Jung;Tae In Ahn
    • Journal of Bio-Environment Control
    • /
    • v.33 no.2
    • /
    • pp.120-128
    • /
    • 2024
  • Since ChatGPT was released in 2022, the generative artificial intelligence (AI) industry has seen massive growth and is expected to bring significant innovations to cognitive tasks. AI-based image generation, in particular, is leading major changes in the digital world. This study investigates the technical foundations of Midjourney, Stable Diffusion, and Firefly-three notable AI image generation tools-and compares their effectiveness by examining the images they produce. The results show that these AI tools can generate realistic images of tomatoes, strawberries, paprikas, and cucumbers, typical crops grown in greenhouse. Especially, Firefly stood out for its ability to produce very realistic images of greenhouse-grown crops. However, all tools struggled to fully capture the environmental context of greenhouses where these crops grow. The process of refining prompts and using reference images has proven effective in accurately generating images of strawberry fruits and their cultivation systems. In the case of generating cucumber images, the AI tools produced images very close to real ones, with no significant differences found in their evaluation scores. This study demonstrates how AI-based image generation technology can be applied in agriculture, suggesting a bright future for its use in this field.