• Title/Summary/Keyword: Image-text generation

Search Result 67, Processing Time 0.024 seconds

Image Generation based on Text and Sketch with Generative Adversarial Networks (생성적 적대 네트워크를 활용한 텍스트와 스케치 기반 이미지 생성 기법)

  • Lee, Je-Hoon;Lee, Dong-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.293-296
    • /
    • 2018
  • 생성적 적대 네트워크를 활용하여 텍스트, 스케치 등 다양한 자원으로부터 이미지를 생성하기 위한 연구는 활발하게 진행되고 있으며 많은 실용적인 연구가 존재한다. 하지만 기존 연구들은 텍스트나 스케치 등 각 하나의 자원을 통해 이미지를 생성하기 때문에 설명이 부족한 텍스트, 실제 이미지와 상이한 스케치와 같이 자원의 정보가 불완전한 경우에는 제대로 된 이미지를 생성하지 못한다는 한계가 있다. 본 논문에서는 기존 연구의 한계점올 극복하기 위해 텍스트와 스케치 두 개의 자원을 동시에 활용하여 이미지를 생성하는 새로운 생성 기법 TS-GAN 을 제안한다. TS-GAN 은 두 단계로 이루어져 있으며 각 단계를 통해 더욱 사실적인 이미지를 생성한다. 본 논문에서 제안한 기법은 컴퓨터 비전 분야에서 많이 활용되는 CUB 데이터세트를 사용하여 이미지 생성 결과의 우수성을 보인다.

Image Generation from Korean Dialogue Text via Prompt-based Few-shot Learning (프롬프트 기반 퓨샷 러닝을 통한 한국어 대화형 텍스트 기반 이미지 생성)

  • Eunchan Lee;Sangtae Ahn
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.447-451
    • /
    • 2022
  • 본 논문에서는 사용자가 대화 텍스트 방식의 입력을 주었을 때 이를 키워드 중심으로 변환하여 이미지를 생성해내는 방식을 제안한다. 대화 텍스트란 채팅 등에서 주로 사용하는 형식의 구어체를 말하며 이러한 텍스트 형식은 텍스트 기반 이미지 생성 모델이 적절한 아웃풋 이미지를 생성하기 어렵게 만든다. 이를 해결하기 위해 대화 텍스트를 키워드 중심 텍스트로 바꾸어 텍스트 기반 이미지 생성 모델의 입력으로 변환하는 과정이 이미지 생성의 질을 높이는 좋은 방안이 될 수 있는데 이러한 태스크에 적합한 학습 데이터는 충분하지 않다. 본 논문에서는 이러한 문제를 다루기 위한 하나의 방안으로 사전학습된 초대형 언어모델인 KoGPT 모델을 활용하며, 퓨샷 러닝을 통해 적은 양의 직접 제작한 데이터만을 학습시켜 대화 텍스트 기반의 이미지 생성을 구현하는 방법을 제안한다.

  • PDF

Natural Photography Generation with Text Guidance from Spherical Panorama Image (360 영상으로부터 텍스트 정보를 이용한 자연스러운 사진 생성)

  • Kim, Beomseok;Jung, Jinwoong;Hong, Eunbin;Cho, Sunghyun;Lee, Seungyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.3
    • /
    • pp.65-75
    • /
    • 2017
  • As a 360-degree image carries information of all directions, it often has too much information. Moreover, in order to investigate a 360-degree image on a 2D display, a user has to either click and drag the image with a mouse, or project it to a 2D panorama image, which inevitably introduces severe distortions. In consequence, investigating a 360-degree image and finding an object of interest in such a 360-degree image could be a tedious task. To resolve this issue, this paper proposes a method to find a region of interest and produces a 2D naturally looking image from a given 360-degree image that best matches a description given by a user in a natural language sentence. Our method also considers photo composition so that the resulting image is aesthetically pleasing. Our method first converts a 360-degree image to a 2D cubemap. As objects in a 360-degree image may appear distorted or split into multiple pieces in a typical cubemap, leading to failure of detection of such objects, we introduce a modified cubemap. Then our method applies a Long Short Term Memory (LSTM) network based object detection method to find a region of interest with a given natural language sentence. Finally, our method produces an image that contains the detected region, and also has aesthetically pleasing composition.

Development of Social Network Game Engine based on ActionScript (액션 스크립트 기반의 소셜 네트워크 게임엔진의 개발)

  • Woo, Chong-Woo;Kim, Dae-Ryung
    • Journal of Internet Computing and Services
    • /
    • v.13 no.1
    • /
    • pp.125-134
    • /
    • 2012
  • As the social networking service (SNS), Facebook, and Cyworld, is developing, the social network game and social business commerce based on this service is activated. Especially, the Social Network Game (SNG) is getting explosive interests and it becomes popular, because it is small scale and user can enjoy the game among close friends. The market for this game is getting larger every year, but still it has some limitations in developing the game. Especially, the current game engine is aiming for developing online or console game, and there is no exclusive game engine for developing SNG. Therefore, it takes lots of time for developing SNG with this game engine. In this paper, we described a design and development of the game engine optimized for developing SNG, which not only adapts the main characteristics of the previous game engine, but also considers the specific characteristics of the SNG. The engine also supports map for the simulation game that is the most popular game in SNG, and also provides modules and tools for developing character animation easily. The evaluation standard for the performance of the game engine is the output generation speed of image, text and character. And the results showed reasonable output speed for developing the SNG in generation of image, text, and character.

Generating Radiology Reports via Multi-feature Optimization Transformer

  • Rui Wang;Rong Hua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2768-2787
    • /
    • 2023
  • As an important research direction of the application of computer science in the medical field, the automatic generation technology of radiology report has attracted wide attention in the academic community. Because the proportion of normal regions in radiology images is much larger than that of abnormal regions, words describing diseases are often masked by other words, resulting in significant feature loss during the calculation process, which affects the quality of generated reports. In addition, the huge difference between visual features and semantic features causes traditional multi-modal fusion method to fail to generate long narrative structures consisting of multiple sentences, which are required for medical reports. To address these challenges, we propose a multi-feature optimization Transformer (MFOT) for generating radiology reports. In detail, a multi-dimensional mapping attention (MDMA) module is designed to encode the visual grid features from different dimensions to reduce the loss of primary features in the encoding process; a feature pre-fusion (FP) module is constructed to enhance the interaction ability between multi-modal features, so as to generate a reasonably structured radiology report; a detail enhanced attention (DEA) module is proposed to enhance the extraction and utilization of key features and reduce the loss of key features. In conclusion, we evaluate the performance of our proposed model against prevailing mainstream models by utilizing widely-recognized radiology report datasets, namely IU X-Ray and MIMIC-CXR. The experimental outcomes demonstrate that our model achieves SOTA performance on both datasets, compared with the base model, the average improvement of six key indicators is 19.9% and 18.0% respectively. These findings substantiate the efficacy of our model in the domain of automated radiology report generation.

A Study on Effective Digital Watermark Generation Method to Overcome Capacity Limit (저장 한계를 극복한 효율적인 디지털 워터마크 생성 방법 연구)

  • Kim Hee-Sun;Cho Dae-Jea
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.6
    • /
    • pp.343-350
    • /
    • 2005
  • During the design of a successful digital watermarking systems, Pseudo-Noise(PN) sequences are widely used to modulate information bits into watermark signals. In this method, the number of bits that can be hidden within a small image by means of frequency domain watermarking is limited. In this paper, we show the possibility of introducing chaotic sequences into digital watermarking systems as potential substitutes to commonly used PN-sequences. And we propose a method that transforms the text to chaotic sequence. In our current implementation, we show how the sample text is expressed by an implied unit data(watermark) and the implied unit data is regenerated into the original left. Because we use this implied data as watermark for information hiding, we can insert much more watermark compared with previous method.

  • PDF

Dynamic Timed Multimedia Synchronization Model for Efficient Quality of Service (효율적인 서비스 품질을 위한 동적 시간형 멀티미디어 동기화 모델)

  • 이근왕;오해석
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.75-80
    • /
    • 1999
  • Multimedia synchronization model for distributed, continuous or discrete media that was guaranteed high quality of service is requited in developing multimedia application software. In this paper we have specific object controller which is called dynamic key media that is changed by user event generation. This becomes media whose event occurrence and periods can't be predicted. For event occurrence not only audio but also text and image can be chosen for key media and performs its role. Object controller transfers information for next transition. The proposed model offers high qualify of services by permitting maximum allowed jitter and skew in playout time and verified its effectiveness by simulation.

  • PDF

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

  • Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
    • The Journal of Korea Robotics Society
    • /
    • v.13 no.1
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

A Study on Generation Quality Comparison of Concrete Damage Image Using Stable Diffusion Base Models (Stable diffusion의 기저 모델에 따른 콘크리트 손상 영상의 생성 품질 비교 연구)

  • Seung-Bo Shim
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.28 no.4
    • /
    • pp.55-61
    • /
    • 2024
  • Recently, the number of aging concrete structures is steadily increasing. This is because many of these structures are reaching their expected lifespan. Such structures require accurate inspections and persistent maintenance. Otherwise, their original functions and performance may degrade, potentially leading to safety accidents. Therefore, research on objective inspection technologies using deep learning and computer vision is actively being conducted. High-resolution images can accurately observe not only micro cracks but also spalling and exposed rebar, and deep learning enables automated detection. High detection performance in deep learning is only guaranteed with diverse and numerous training datasets. However, surface damage to concrete is not commonly captured in images, resulting in a lack of training data. To overcome this limitation, this study proposed a method for generating concrete surface damage images, including cracks, spalling, and exposed rebar, using stable diffusion. This method synthesizes new damage images by paired text and image data. For this purpose, a training dataset of 678 images was secured, and fine-tuning was performed through low-rank adaptation. The quality of the generated images was compared according to three base models of stable diffusion. As a result, a method to synthesize the most diverse and high-quality concrete damage images was developed. This research is expected to address the issue of data scarcity and contribute to improving the accuracy of deep learning-based damage detection algorithms in the future.

A Generation Methodology of Facial Expressions for Avatar Communications (아바타 통신에서의 얼굴 표정의 생성 방법)

  • Kim Jin-Yong;Yoo Jae-Hwi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.55-64
    • /
    • 2005
  • The avatar can be used as an auxiliary methodology of text and image communications in cyber space. An intelligent communication method can also be utilized to achieve real-time communication, where intelligently coded data (joint angles for arm gestures and action units for facial emotions) are transmitted instead of real or compressed pictures. In this paper. for supporting the action of arm and leg gestures, a method of generating the facial expressions that can represent sender's emotions is provided. The facial expression can be represented by Action Unit(AU), in this paper we suggest the methodology of finding appropriate AUs in avatar models that have various shape and structure. And, to maximize the efficiency of emotional expressions, a comic-style facial model having only eyebrows, eyes, nose, and mouth is employed. Then generation of facial emotion animation with the parameters is also investigated.

  • PDF