• Title/Summary/Keyword: Vision language model

Search Result 43, Processing Time 0.028 seconds

Meme Analysis using Image Captioning Model and GPT-4

  • Marvin John Ignacio;Thanh Tin Nguyen;Jia Wang;Yong-Guk Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.628-631
    • /
    • 2023
  • We present a new approach to evaluate the generated texts by Large Language Models (LLMs) for meme classification. Analyzing an image with embedded texts, i.e. meme, is challenging, even for existing state-of-the-art computer vision models. By leveraging large image-to-text models, we can extract image descriptions that can be used in other tasks, such as classification. In our methodology, we first generate image captions using BLIP-2 models. Using these captions, we use GPT-4 to evaluate the relationship between the caption and the meme text. The results show that OPT6.7B provides a better rating than other LLMs, suggesting that the proposed method has a potential for meme classification.

Design Evaluation of Portable Electronic Products Using AR-Based Interaction and Simulation (증강현실 기반 상호작용과 시뮬레이션을 이용한 휴대용 전자제품의 설계품평)

  • Park, Hyung-Jun;Moon, Hee-Cheol
    • Korean Journal of Computational Design and Engineering
    • /
    • v.13 no.3
    • /
    • pp.209-216
    • /
    • 2008
  • This paper presents a novel approach to design evaluation of portable consumer electronic (PCE) products using augmented reality (AR) based tangible interaction and functional behavior simulation. In the approach, the realistic visualization is acquired by overlaying the rendered image of a PCE product on the real world environment in real-time using computer vision based augmented reality. For tangible user interaction in an AR environment, the user creates input events by touching specified regions of the product-type tangible object with the pointer-type tangible object. For functional behavior simulation, we adopt state transition methodology to capture the functional behavior of the product into a markup language-based information model, and build a finite state machine (FSM) to controls the transition between states of the product based on the information model. The FSM is combined with AR-based tangible objects whose operation in the AR environment facilitates the realistic visualization and functional simulation of the product, and thus realizes faster product design and development. Based on the proposed approach, a product design evaluation system has been developed and applied for the design evaluation of various PCE products with highly encouraging feedbacks from users.

AI-BASED Monitoring Of New Plant Growth Management System Design

  • Seung-Ho Lee;Seung-Jung Shin
    • International journal of advanced smart convergence
    • /
    • v.12 no.3
    • /
    • pp.104-108
    • /
    • 2023
  • This paper deals with research on innovative systems using Python-based artificial intelligence technology in the field of plant growth monitoring. The importance of monitoring and analyzing the health status and growth environment of plants in real time contributes to improving the efficiency and quality of crop production. This paper proposes a method of processing and analyzing plant image data using computer vision and deep learning technologies. The system was implemented using Python language and the main deep learning framework, TensorFlow, PyTorch. A camera system that monitors plants in real time acquires image data and provides it as input to a deep neural network model. This model was used to determine the growth state of plants, the presence of pests, and nutritional status. The proposed system provides users with information on plant state changes in real time by providing monitoring results in the form of visual or notification. In addition, it is also used to predict future growth conditions or anomalies by building data analysis and prediction models based on the collected data. This paper is about the design and implementation of Python-based plant growth monitoring systems, data processing and analysis methods, and is expected to contribute to important research areas for improving plant production efficiency and reducing resource consumption.

An Analysis of Media in Advanced Learning Activities of Middle School Special Korean Textbooks by the Information Processing Model (정보처리모형을 활용한 중학교 특수 국어 교과서 심화 학습활동 수록 매체 분석)

  • Song, Gi-Ho;Noh, Jeong-Im
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.31 no.3
    • /
    • pp.29-50
    • /
    • 2020
  • The purpose of this study is to analyze the characteristics of the media contained in textbooks for students with disabilities based on the information processing model, and to find ways to utilize library materials for class improvement for them. To this end, the media included in the in-depth learning activities of the Korean language textbooks of the 2015 revised special education basic curriculum were analyzed. As a result of the analysis, it was found that students with disabilities received information mainly through vision, process information through understanding, and use language intelligence to produce results. Specifically, they accepts learning contents through illustrations and texts, processes the contents based on understanding such as reasoning and explanation, and then uses linguistic intelligence such as writing and speaking to produce results. Based on the results of this analysis, a practical method to utilize library materials in the Korean language class of students with disabilities was proposed as follows. Developing a variety of input mediums based on reading stages and collection mapping for students with disabilities. Providing book materials through reading and listening. Teaching appropriate methodological knowledge to self-directly solve advanced learning activities. In addition, developing types of writing and writing strategies that can help various production activities.

A Study on Utilization of Vision Transformer for CTR Prediction (CTR 예측을 위한 비전 트랜스포머 활용에 관한 연구)

  • Kim, Tae-Suk;Kim, Seokhun;Im, Kwang Hyuk
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.27-40
    • /
    • 2021
  • Click-Through Rate (CTR) prediction is a key function that determines the ranking of candidate items in the recommendation system and recommends high-ranking items to reduce customer information overload and achieve profit maximization through sales promotion. The fields of natural language processing and image classification are achieving remarkable growth through the use of deep neural networks. Recently, a transformer model based on an attention mechanism, differentiated from the mainstream models in the fields of natural language processing and image classification, has been proposed to achieve state-of-the-art in this field. In this study, we present a method for improving the performance of a transformer model for CTR prediction. In order to analyze the effect of discrete and categorical CTR data characteristics different from natural language and image data on performance, experiments on embedding regularization and transformer normalization are performed. According to the experimental results, it was confirmed that the prediction performance of the transformer was significantly improved when the L2 generalization was applied in the embedding process for CTR data input processing and when batch normalization was applied instead of layer normalization, which is the default regularization method, to the transformer model.

Educational Framework for Interactive Product Prototyping

  • Nam Tek-Jin
    • Archives of design research
    • /
    • v.19 no.3 s.65
    • /
    • pp.93-104
    • /
    • 2006
  • When the design profession started, design targets were mainly static hardware centered products. Due to the development of network and digital technologies, new products with dynamic and software-hardware hybrid interactive characteristics have become one of the main design targets. To accomplish the new projects, designers are required to learn new methods, tools and theories in addition to the traditional design expertise of visual language. One of the most important tools for the change is effective and rapid prototyping. There have been few researches on educational framework for interactive product or system prototyping to date. This paper presents a new model of educational contents and methods for interactive digital product prototyping, and it's application in a design curricula. The new course contents, integrated with related topics such as physical computing and tangible user interface, include microprocessor programming, digital analogue input and output, multimedia authoring and programming language, sensors, communication with other external devices, computer vision, and movement control using motors. The final project of the course was accomplished by integrating all the exercises. Our educational experience showed that design students with little engineering background could learn various interactive digital technologies and its' implementation method in one semester course. At the end of the course, most of the students were able to construct prototypes that illustrate interactive digital product concepts. It was found that training for logical and analytical thinking is necessary in design education. The paper highlights the emerging contents in design education to cope with the new design paradigm. It also suggests an alterative to reflect the new requirements focused on interactive product or system design projects. The tools and methods suggested can also be beneficial to students, educators, and designers working in digital industries.

  • PDF

Convergent Web-based Education Program to Prevent Dementia (웹기반의 치매 예방용 융합교육 프로그램 개발)

  • Park, Kyung-Soon;Park, Jae-Seong;Ban, Keum-Ok;Kim, Kyoung-Oak
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.11
    • /
    • pp.322-331
    • /
    • 2013
  • The purpose of the present study was to develop a convergent education contents for dementia prevention, operating on the web network applying modern information technology(IT). At the preparation stage, local and worldwide literatures related to dementia were analyzed followed by surveying industry demands, based on which the program was designed and developed. In the following enhancement stage, the program was modified as much as possible by advices obtained from experts in various fields. Development results of the present program are summarized as follows. Firstly, 645 intellect development model to prevent dementia was established through peer review and verification of convergent education theories by expert groups. This model was named as "Garisani" meaning "cognition capable of judging objects" in the Korean language. Secondly, 'Find a way' and 'Connect a line' modules were developed in the numeric field as well as 'Identify a letter(I, II)' modules, in the language field for web-based left brain training program. Thirdly, 'Find my car' and 'Vision training' modules in the attention field and 'Object inference' and 'Compare pictures' modules in the cognition field were developed for web-based right brain training program. Fourth, 'Pentomino' and 'BQmaze'(Brain Quotient and maze) modules in the space perception field and 'Visual training' in the memory field were developed for web-based left and right brains training. Fifth, all results were integrated leading to a 52 week Garisani convergent education program for dementia prevention.

A study on Convergence Factors Related with Academic Burnout of Students in Health Majors in Studying for TOEIC (보건계열 일부 대학생의 토익학습의 학업소진 영향과 관련된 융복합적 요인 연구)

  • Hong, Soomi;Kim, Seung-Hee;Bae, Sang-Yun
    • Journal of Digital Convergence
    • /
    • v.15 no.6
    • /
    • pp.315-327
    • /
    • 2017
  • This study was carried out to examine the converging factors related to academic burnout in students from health-related majors who are involved in studying for the Test of English for International Communication (TOEIC). Research subjects included 291 randomly selected students from the J-region, who were enrolled in TOEIC classes. Data collection took place from April 3, 2017 to April 14, 2017, using anonymous self-administered questionnaires. The results of a multiple regression analysis in female students showed higher academic burnout pertaining to studying for the TOEIC when sleeping hours, self-worth, self-efficacy, school adaptation resilience, and study immersion were low, and job-seeking stress and test anxiety were high. The explanatory power of this model was 65%. Based on these results, to reduce academic burnout pertaining to studying for the TOEIC in health majors, it is first necessary to increase sleeping hours, self-worth, self-efficacy, school adaptation resilience, and study immersion, and to make efforts to manage self-competence, job-seeking stress, and test anxiety. The results of this study may be used to decrease the academic burnout caused by studying for the TOEIC in health major students and to increase their aptitude for studying English, to cultivate globalized capabilities. Future studies need to conduct an effect analysis on the control and mediation effect that these factors have on academic burnout.

The study about historical style of animation :Focused on the individual style and USA's style & Japan's style (애니메이션의 역사적 양식에 대한 연구:개인양식과 미국의 디즈니.일본의 지브리 양식을 중심으로)

  • Kim, Jae-Woong
    • Cartoon and Animation Studies
    • /
    • s.16
    • /
    • pp.49-65
    • /
    • 2009
  • I try to extract typological factors from the historically relevant works of animation, for that we can refer 'style.' creator's individuality, general tendencies and restraints of his time, the national properties work on them. It is the individual aspect that excels in the works of Jiri Trnka, Tim Burton, Yuri Norstein, so they stand out not only in their own specific sensibility, vision but also in the elucidation of themes and the technology of dealing medium. On the other hand the Walt Disney's animation has so distinctive characteristics that we could identify them. Disney's so-called classical model accomplished a typically American animation form as full of expressive visual language, over reaction, full animation describing detail action. We could tell, Miyajaki Hayao's Gibri Studio represents Japanese animation, which works on traditional motives of Japan and expresses unique humanity.

  • PDF

Design of Image Extraction Hardware for Hand Gesture Vision Recognition

  • Lee, Chang-Yong;Kwon, So-Young;Kim, Young-Hyung;Lee, Yong-Hwan
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.1
    • /
    • pp.71-83
    • /
    • 2020
  • In this paper, we propose a system that can detect the shape of a hand at high speed using an FPGA. The hand-shape detection system is designed using Verilog HDL, a hardware language that can process in parallel instead of sequentially running C++ because real-time processing is important. There are several methods for hand gesture recognition, but the image processing method is used. Since the human eye is sensitive to brightness, the YCbCr color model was selected among various color expression methods to obtain a result that is less affected by lighting. For the CbCr elements, only the components corresponding to the skin color are filtered out from the input image by utilizing the restriction conditions. In order to increase the speed of object recognition, a median filter that removes noise present in the input image is used, and this filter is designed to allow comparison of values and extraction of intermediate values at the same time to reduce the amount of computation. For parallel processing, it is designed to locate the centerline of the hand during scanning and sorting the stored data. The line with the highest count is selected as the center line of the hand, and the size of the hand is determined based on the count, and the hand and arm parts are separated. The designed hardware circuit satisfied the target operating frequency and the number of gates.