• Title/Summary/Keyword: text-to-image

Search Result 904, Processing Time 0.031 seconds

A Study on the Generation of Webtoons through Fine-Tuning of Diffusion Models (확산모델의 미세조정을 통한 웹툰 생성연구)

  • Kyungho Yu;Hyungju Kim;Jeongin Kim;Chanjun Chun;Pankoo Kim
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.76-83
    • /
    • 2023
  • This study proposes a method to assist webtoon artists in the process of webtoon creation by utilizing a pretrained Text-to-Image model to generate webtoon images from text. The proposed approach involves fine-tuning a pretrained Stable Diffusion model using a webtoon dataset transformed into the desired webtoon style. The fine-tuning process, using LoRA technique, completes in a quick training time of approximately 4.5 hours with 30,000 steps. The generated images exhibit the representation of shapes and backgrounds based on the input text, resulting in the creation of webtoon-like images. Furthermore, the quantitative evaluation using the Inception score shows that the proposed method outperforms DCGAN-based Text-to-Image models. If webtoon artists adopt the proposed Text-to-Image model for webtoon creation, it is expected to significantly reduce the time required for the creative process.

Skewed Angle Detection in Text Images Using Orthogonal Angle View

  • Chin, Seong-Ah;Choo, Moon-Won
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.62-65
    • /
    • 2000
  • In this paper we propose skewed angle detection methods for images that contain text that is not aligned horizontally. In most images text areas are aligned along the horizontal axis, however there are many occasions when the text may be at a skewed angle (denoted by 0 < ${\theta}\;{\leq}\;{\pi}$). In the work described, we adapt the Hough transform, Shadow and Threshold Projection methods to detect the skewed angle of text in an input image using the orthogonal angle view property. The results of this method are a primary text skewed angle, which allows us to rotate the original input image into an image with horizontally aligned text. This utilizes document image processing prior to the recognition stage.

  • PDF

Improved Spam Filter via Handling of Text Embedded Image E-mail

  • Youn, Seongwook;Cho, Hyun-Chong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.1
    • /
    • pp.401-407
    • /
    • 2015
  • The increase of image spam, a kind of spam in which the text message is embedded into attached image to defeat spam filtering technique, is a major problem of the current e-mail system. For nearly a decade, content based filtering using text classification or machine learning has been a major trend of anti-spam filtering system. Recently, spammers try to defeat anti-spam filter by many techniques. Text embedding into attached image is one of them. We proposed an ontology spam filters. However, the proposed system handles only text e-mail and the percentage of attached images is increasing sharply. The contribution of the paper is that we add image e-mail handling capability into the anti-spam filtering system keeping the advantages of the previous text based spam e-mail filtering system. Also, the proposed system gives a low false negative value, which means that user's valuable e-mail is rarely regarded as a spam e-mail.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

Detecting and Segmenting Text from Images for a Mobile Translator System

  • Chalidabhongse, Thanarat H.;Jeeraboon, Poonsak
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.875-878
    • /
    • 2004
  • Researching in text detection and segmentation has been done for a long period in the OCR area. However, there is some other area that the text detection and segmentation from images can be very useful. In this report, we first propose the design of a mobile translator system which helps non-native speakers to understand the foreign language using ubiquitous mobile network and camera mobile phones. The main focus of the paper will be the algorithm in detecting and segmenting texts embedded in the natural scenes from taken images. The image, which is captured by a camera mobile phone, is transmitted to a translator server. It is initially passed through some preprocessing processes to smooth the image as well as suppress noises. A threshold is applied to binarize the image. Afterward, an edge detection algorithm and connected component analysis are performed on the filtered image to find edges and segment the components in the image. Finally, the pre-defined layout relation constraints are utilized in order to decide which components likely to be texts in the image. A preliminary experiment was done and the system yielded a recognition rate of 94.44% on a set of 36 various natural scene images that contain texts.

  • PDF

Image Steganography to Hide Unlimited Secret Text Size

  • Almazaydeh, Wa'el Ibrahim A.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.73-82
    • /
    • 2022
  • This paper shows the hiding process of unlimited secret text size in an image using three methods: the first method is the traditional method in steganography that based on the concealing the binary value of the text using the least significant bits method, the second method is a new method to hide the data in an image based on Exclusive OR process and the third one is a new method for hiding the binary data of the text into an image (that may be grayscale or RGB images) using Exclusive and Huffman Coding. The new methods shows the hiding process of unlimited text size (data) in an image. Peak Signal to Noise Ratio (PSNR) is applied in the research to simulate the results.

A study on Extensions to Music Player MAF for Multiple JPEG images and Text data with Synchronization (다중 영상 및 텍스트 동기화를 고려한 Music Player MAF 의 확장 포맷 연구)

  • Yang, Chan-Suk;Lim, Jeong-Yeon;Kim, Mun-Churl
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.967-970
    • /
    • 2005
  • The Music Player MAF Player Format of ISO/IEC 23000-2 FDIS consists of MP3 data, MPEG-7 metadata and one optional JPEG image data based on MPEG-4 File Format. However, the current Music Player MAF format does not allow multiple JPEG image data or timed text data. It is helpful to use timed text data and multiple JPEG images in the various multimedia applications. For example, listening material for the foreign language needs an additional book which has text and images, the audio contents which can get image and text data can be helpful to understand the whole story and situations well. In this paper, we propose the detailed file structure in conjunction with MPEG-4 File Format in order to improve the functionalities, which carry multiple image data and text data with synchronization information between MP3 data and other resources.

  • PDF

A Study on Improvement of Image Classification Accuracy Using Image-Text Pairs (이미지-텍스트 쌍을 활용한 이미지 분류 정확도 향상에 관한 연구)

  • Mi-Hui Kim;Ju-Hyeok Lee
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.561-566
    • /
    • 2023
  • With the development of deep learning, it is possible to solve various computer non-specialized problems such as image processing. However, most image processing methods use only the visual information of the image to process the image. Text data such as descriptions and annotations related to images may provide additional tactile and visual information that is difficult to obtain from the image itself. In this paper, we intend to improve image classification accuracy through a deep learning model that analyzes images and texts using image-text pairs. The proposed model showed an approximately 11% classification accuracy improvement over the deep learning model using only image information.

Text Augmentation Using Hierarchy-based Word Replacement

  • Kim, Museong;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.57-67
    • /
    • 2021
  • Recently, multi-modal deep learning techniques that combine heterogeneous data for deep learning analysis have been utilized a lot. In particular, studies on the synthesis of Text to Image that automatically generate images from text are being actively conducted. Deep learning for image synthesis requires a vast amount of data consisting of pairs of images and text describing the image. Therefore, various data augmentation techniques have been devised to generate a large amount of data from small data. A number of text augmentation techniques based on synonym replacement have been proposed so far. However, these techniques have a common limitation in that there is a possibility of generating a incorrect text from the content of an image when replacing the synonym for a noun word. In this study, we propose a text augmentation method to replace words using word hierarchy information for noun words. Additionally, we performed experiments using MSCOCO data in order to evaluate the performance of the proposed methodology.

A Study on the Textuality Represented in Modern Fashion Photographs (현대 패션사진에 나타난 텍스트성 연구)

  • Park, Mi-Joo;Yang, Sook-Hi
    • The Research Journal of the Costume Culture
    • /
    • v.18 no.5
    • /
    • pp.977-990
    • /
    • 2010
  • Today, as individuals show their social identities and reflect their being as the members of society with a culture, an art style and communication function are stood out in fashion photographs. Accordingly, the meanings of images into text are expanded in its interpretative width through the acceptor's various terms. This researcher looked into four theories of both positions on the textuality of language and image, and considered the point of discussion on image of each theory through modern fashion photographs. First, the theory which divides language and image as auditory and visual recognitions in the textuality of language and image is limited from the view it focuses on only one side without considering the ambivalent elements of each field. For the textuality in modern fashion photographs, the observer attempts to turn it into text to give meaning to it as the recognition through five senses conforming to the acceptor's condition. Second, the theory dividing language and image into the text of time properties and spacial properties has limitation in the text, for acceptor's experience of the object appears as the structured form in time and space rather than being defined as two things like time and space. Third, the theory classifying the language and image text into conventional taste and natural taste has limitation from the view that image text is hardly an object of consistent classification in ease of recognition by the code accepted in society. Thus, this can't be fundamental approach for the understanding of the text of decoding trend represented in modern fashion photographs. Fourth, accordingly, this researcher focussed on contextual and arbitrary text of fashion photographs through the theory of Nelson Goodman which discusses image text through the differences in textuality. Basic mechanism of perceiving and recognizing and distinguish image is closely related to habit and custom like language. So, each acceptor perceives the image as a text through arbitrary interpretation obtained by individual, empirical, historical, and educational viewpoints. The textuality of modern fashion photographs aims to widen the range of diverse knowledge and understanding, transcending the regulations of simple function of existing fashion photographs. Consequently, this researcher puts forward the opinion of consistent and diverse follow-up studies on instilling meaning into fashion photographs for the understanding de-regulatory and de-constructive through various senses by avoiding only one sense-dependent fixed and regulatory properties of it.