• Title/Summary/Keyword: 멀티모달 정보분석

Search Result 44, Processing Time 0.018 seconds

Audio Generative AI Usage Pattern Analysis by the Exploratory Study on the Participatory Assessment Process

  • Hanjin Lee;Yeeun Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.47-54
    • /
    • 2024
  • The importance of cultural arts education utilizing digital tools is increasing in terms of enhancing tech literacy, self-expression, and developing convergent capabilities. The creation process and evaluation of innovative multi-modal AI, provides expanded creative audio-visual experiences in users. In particular, the process of creating music with AI provides innovative experiences in all areas, from musical ideas to improving lyrics, editing and variations. In this study, we attempted to empirically analyze the process of performing tasks using an Audio and Music Generative AI platform and discussing with fellow learners. As a result, 12 services and 10 types of evaluation criteria were collected through voluntary participation, and divided into usage patterns and purposes. The academic, technological, and policy implications were presented for AI-powered liberal arts education with learners' perspectives.

Performance Analysis for Accuracy of Personality Recognition Models based on Setting of Margin Values at Face Region Extraction (얼굴 영역 추출 시 여유값의 설정에 따른 개성 인식 모델 정확도 성능 분석)

  • Qiu Xu;Gyuwon Han;Bongjae Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.141-147
    • /
    • 2024
  • Recently, there has been growing interest in personalized services tailored to an individual's preferences. This has led to ongoing research aimed at recognizing and leveraging an individual's personality traits. Among various methods for personality assessment, the OCEAN model stands out as a prominent approach. In utilizing OCEAN for personality recognition, a multi modal artificial intelligence model that incorporates linguistic, paralinguistic, and non-linguistic information is often employed. This paper examines the impact of the margin value set for extracting facial areas from video data on the accuracy of a personality recognition model that uses facial expressions to determine OCEAN traits. The study employed personality recognition models based on 2D Patch Partition, R2plus1D, 3D Patch Partition, and Video Swin Transformer technologies. It was observed that setting the facial area extraction margin to 60 resulted in the highest 1-MAE performance, scoring at 0.9118. These findings indicate the importance of selecting an optimal margin value to maximize the efficiency of personality recognition models.

A Study on Success Strategies for Generative AI Services in Mobile Environments: Analyzing User Experience Using LDA Topic Modeling Approach (모바일 환경에서의 생성형 AI 서비스 성공 전략 연구: LDA 토픽모델링을 활용한 사용자 경험 분석)

  • Soyon Kim;Ji Yeon Cho;Sang-Yeol Park;Bong Gyou Lee
    • Journal of Internet Computing and Services
    • /
    • v.25 no.4
    • /
    • pp.109-119
    • /
    • 2024
  • This study aims to contribute to the initial research on on-device AI in an environment where generative AI-based services on mobile and other on-device platforms are increasing. To derive success strategies for generative AI-based chatbot services in a mobile environment, over 200,000 actual user experience review data collected from the Google Play Store were analyzed using the LDA topic modeling technique. Interpreting the derived topics based on the Information System Success Model (ISSM), the topics such as tutoring, limitation of response, and hallucination and outdated informaiton were linked to information quality; multimodal service, quality of response, and issues of device interoperability were linked to system quality; inter-device compatibility, utility of the service, quality of premium services, and challenges in account were linked to service quality; and finally, creative collaboration was linked to net benefits. Humanization of generative AI emerged as a new experience factor not explained by the existing model. By explaining specific positive and negative experience dimensions from the user's perspective based on theory, this study suggests directions for future related research and provides strategic insights for companies to improve and supplement their services for successful business operations.

Multi-modal Image Processing for Improving Recognition Accuracy of Text Data in Images (이미지 내의 텍스트 데이터 인식 정확도 향상을 위한 멀티 모달 이미지 처리 프로세스)

  • Park, Jungeun;Joo, Gyeongdon;Kim, Chulyun
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.148-158
    • /
    • 2018
  • The optical character recognition (OCR) is a technique to extract and recognize texts from images. It is an important preprocessing step in data analysis since most actual text information is embedded in images. Many OCR engines have high recognition accuracy for images where texts are clearly separable from background, such as white background and black lettering. However, they have low recognition accuracy for images where texts are not easily separable from complex background. To improve this low accuracy problem with complex images, it is necessary to transform the input image to make texts more noticeable. In this paper, we propose a method to segment an input image into text lines to enable OCR engines to recognize each line more efficiently, and to determine the final output by comparing the recognition rates of CLAHE module and Two-step module which distinguish texts from background regions based on image processing techniques. Through thorough experiments comparing with well-known OCR engines, Tesseract and Abbyy, we show that our proposed method have the best recognition accuracy with complex background images.