• Title/Summary/Keyword: 텍스트 기반 이미지 생성 모델

Search Result 27, Processing Time 0.019 seconds

Transfer Learning-based Multi-Modal Fusion Answer Selection Model for Video Question Answering System (비디오 질의 응답 시스템을 위한 전이 학습 기반의 멀티 모달 퓨전 정답 선택 모델)

  • Park, Gyu-Min;Park, Seung-Bae
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.548-553
    • /
    • 2021
  • 비디오 질의 응답은 입력으로 주어진 비디오와 질문에 적절할 정답을 제공하기 위해 텍스트, 이미지 등 다양한 정보처리가 요구되는 대표적인 multi-modal 문제이다. 질의 응답 시스템은 질의 응답의 성능을 높이기 위해 다수의 서로 다른 응답 모듈을 사용하기도 하며 생성된 정답 후보군 중 가장 적절할 정답을 선택하는 정답 선택 모듈이 필요하다. 정답 선택 모듈은 응답 모듈의 서로 다른 관점을 고려하여 응답 선택을 선택할 필요성이 있다. 하지만 응답 모듈이 black-box 모델인 경우 정답 선택 모듈은 응답 모듈의 parameter와 예측 분포를 통해 지식을 전달 받기 어렵다. 그리고 학습 데이터셋은 응답 모듈이 학습에 사용했기 때문에 과적합 문제로 각 모듈의 관점을 학습하기엔 어려우며 학습 데이터셋 이외 비교적 적은 데이터셋으로 학습해야 하는 문제점이 있다. 본 논문에서는 정답 선택 성능을 높이기 위해 전이 학습 기반의 멀티모달 퓨전 정답 선택 모델을 제안한다. DramaQA 데이터셋을 통해 성능을 측정하여 제안된 모델의 우수성을 실험적으로 증명하였다.

  • PDF

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Analysis of deep learning-based deep clustering method (딥러닝 기반의 딥 클러스터링 방법에 대한 분석)

  • Hyun Kwon;Jun Lee
    • Convergence Security Journal
    • /
    • v.23 no.4
    • /
    • pp.61-70
    • /
    • 2023
  • Clustering is an unsupervised learning method that involves grouping data based on features such as distance metrics, using data without known labels or ground truth values. This method has the advantage of being applicable to various types of data, including images, text, and audio, without the need for labeling. Traditional clustering techniques involve applying dimensionality reduction methods or extracting specific features to perform clustering. However, with the advancement of deep learning models, research on deep clustering techniques using techniques such as autoencoders and generative adversarial networks, which represent input data as latent vectors, has emerged. In this study, we propose a deep clustering technique based on deep learning. In this approach, we use an autoencoder to transform the input data into latent vectors, and then construct a vector space according to the cluster structure and perform k-means clustering. We conducted experiments using the MNIST and Fashion-MNIST datasets in the PyTorch machine learning library as the experimental environment. The model used is a convolutional neural network-based autoencoder model. The experimental results show an accuracy of 89.42% for MNIST and 56.64% for Fashion-MNIST when k is set to 10.

An XML-based Multimedia News Management System (XML 기반 멀티미디어 뉴스 관리 시스템)

  • Kim Hyon Hee;Park Seung Soo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.785-792
    • /
    • 2004
  • With recent progress of related multimedia computing technologies, it is necessay to retrieve diverse types of multimedia data based on multi-media content and their relationships. However, different from alphanumeric data, it is difficult to provide relevant multimedia information, be-cause multimedia contents and their relationships are implied in multimedia data. Therefore, in case of a multimedia news service system that is a representative multimedia application, most of new services provide relevant news about text articles and retrieval of multimedia news such as video news or image news are provided independently. In this paper, we present an XML-based multimedia news management system, which provides integrating, retrieval, and delivery of relevant multimedia news. Our data model composed of media object, relationship object, and view object represents diverse types of multimedia news content and semantically related multimedia news. In addition, a proposed view mechanism makes it possible to customize multimedia news, and therefore provides multimedia news efficiently.

A review of artificial intelligence based demand forecasting techniques (인공지능 기반 수요예측 기법의 리뷰)

  • Jeong, Hyerin;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.795-835
    • /
    • 2019
  • Big data has been generated in various fields. Many companies have now tried to make profits by building a system capable of analyzing big data based on artificial intelligence (AI) techniques. Integrating AI technology has made analyzing and utilizing vast amounts of data increasingly valuable. In particular, demand forecasting with maximum accuracy is critical to government and business management in various fields such as finance, procurement, production and marketing. In this case, it is important to apply an appropriate model that considers the demand pattern for each field. It is possible to analyze complex patterns of real data that can also be enlarged by a traditional time series model or regression model. However, choosing the right model among the various models is difficult without prior knowledge. Many studies based on AI techniques such as machine learning and deep learning have been proven to overcome these problems. In addition, demand forecasting through the analysis of stereotyped data and unstructured data of images or texts has also shown high accuracy. This paper introduces important areas where demand forecasts are relatively active as well as introduces machine learning and deep learning techniques that consider the characteristics of each field.

Boosting the Performance of the Predictive Model on the Imbalanced Dataset Using SVM Based Bagging and Out-of-Distribution Detection (SVM 기반 Bagging과 OoD 탐색을 활용한 제조공정의 불균형 Dataset에 대한 예측모델의 성능향상)

  • Kim, Jong Hoon;Oh, Hayoung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.455-464
    • /
    • 2022
  • There are two unique characteristics of the datasets from a manufacturing process. They are the severe class imbalance and lots of Out-of-Distribution samples. Some good strategies such as the oversampling over the minority class, and the down-sampling over the majority class, are well known to handle the class imbalance. In addition, SMOTE has been chosen to address the issue recently. But, Out-of-Distribution samples have been studied just with neural networks. It seems to be hardly shown that Out-of-Distribution detection is applied to the predictive model using conventional machine learning algorithms such as SVM, Random Forest and KNN. It is known that conventional machine learning algorithms are much better than neural networks in prediction performance, because neural networks are vulnerable to over-fitting and requires much bigger dataset than conventional machine learning algorithms does. So, we suggests a new approach to utilize Out-of-Distribution detection based on SVM algorithm. In addition to that, bagging technique will be adopted to improve the precision of the model.

A Study on the Conceptual Modeling and Implementation of a Semantic Search System (시맨틱 검색 시스템의 개념적 모형화와 그 구현에 대한 연구)

  • Hana, Dong-Il;Kwonb, Hyeong-In;Chong, Hak-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.1
    • /
    • pp.67-84
    • /
    • 2008
  • This paper proposes a design and realization for the semantic search system. The proposed model includes three Architecture Layers of a Semantic Search System ; (they are conceptually named as) the Knowledge Acquisition, the Knowledge Representation and the Knowledge Utilization. Each of these three Layers are designed to interactively work together, so as to maximize the users' information needs. The Knowledge Acquisition Layer includes index and storage of Semantic Metadata from various source of web contents(eg : text, image, multimedia and so on). The Knowledge Representation Layer includes the ontology schema and instance, through the process of semantic search by ontology based query expansion. Finally, the Knowledge Utilization Layer includes the users to search query intuitively, and get its results without the users'knowledge of semantic web language or ontology. So far as the design and the realization of the semantic search site is concerned, the proposedsemantic search system will offer useful implications to the researchers and practitioners so as to improve the research level to the commercial use.

  • PDF