• Title/Summary/Keyword: 표 형식의 데이터 학습

Search Result 3, Processing Time 0.018 seconds

Reinforced Generator GAN Model for Tabular Data Learning (Tabular Data 학습을 위한 강화형 생성자 GAN Mode)

  • Chan-sik Sung;Joon-sik Lim
    • Journal of Internet Computing and Services
    • /
    • v.25 no.5
    • /
    • pp.121-130
    • /
    • 2024
  • Tabular Data is a mixture of numerical and categorical data, and machine learning models have been evaluated to be more suitable than generative models in performing learning using such tabular data. This evaluation is because the generative model had a problem of excessively increasing parameters or not finding the direction of learning due to the numerical multimodal distribution and categorical frequency imbalance, which are characteristics of Tabular Data. However, as data gradually becomes big data and becomes real-time, existing machine learning models have shown limitations in their application. In this paper, as a methodology for applying generative models to tabular data, we propose RGGAN (Reinforced Generator GAN), a reinforced generator adversarial neural network that Clustering sampling that leverages conjugate prior distributions and the loss function improved with Gower coefficients and mutual information. As a result of measuring the AUC by detecting fraudulent transactions in the IEEE-CIS Fraud Detection Dataset by constructing an anomaly detector with the discriminators learned from the RGGAN proposed in this paper, it showed a performance improvement effect of 1-7% over the existing generative models, proving that the proposed model is effective for learning tabular data and also effective in detecting fraudulent transactions.

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Analyzing Tasks in the Statistics Area of Korean and Singaporean Textbooks from the Perspective of Mathematical Modeling: Focusing on 7th Grade (수학적 모델링 관점에 따른 한국과 싱가포르의 통계영역 과제 분석: 중학교 1학년 교과서를 중심으로)

  • Kim, Somin
    • Journal of the Korean School Mathematics Society
    • /
    • v.24 no.3
    • /
    • pp.283-308
    • /
    • 2021
  • This study aims to analyze statistical tasks in Korean and Singaporean textbooks with the mathematical modeling perspective and compare the learning contents and experiences of students from both countries. I analyzed mathematical modeling tasks in the textbooks based on five aspects: (1) the mathematical modeling process, (2) the data type, (3) the expression type, (4) the context, and (5) the mathematical activity. The results of this study show that Korean and Singaporean textbooks provide the highest percentage of the "working-with-mathematics" task, the highest percentage of the "matching task," and the highest percentage of the "picture" task. The real-world context and mathematical activities used in Korean and Singaporean textbooks differed in percentage. This study provides implications for the development of textbook tasks to support future mathematical modeling activities. This includes providing a balanced experience in mathematical modeling processes and presenting tasks in various forms of expression to raise students' cognitive level and expand the opportunity to experience meaningful mathematizing. In addition, it is necessary to present a contextually realistic task for students' interest in mathematical modeling activities or motivation for learning.