• Title/Summary/Keyword: 클러스터링 샘플링

Search Result 14, Processing Time 0.022 seconds

Reinforced Generator GAN Model for Tabular Data Learning (Tabular Data 학습을 위한 강화형 생성자 GAN Mode)

  • Chan-sik Sung;Joon-sik Lim
    • Journal of Internet Computing and Services
    • /
    • v.25 no.5
    • /
    • pp.121-130
    • /
    • 2024
  • Tabular Data is a mixture of numerical and categorical data, and machine learning models have been evaluated to be more suitable than generative models in performing learning using such tabular data. This evaluation is because the generative model had a problem of excessively increasing parameters or not finding the direction of learning due to the numerical multimodal distribution and categorical frequency imbalance, which are characteristics of Tabular Data. However, as data gradually becomes big data and becomes real-time, existing machine learning models have shown limitations in their application. In this paper, as a methodology for applying generative models to tabular data, we propose RGGAN (Reinforced Generator GAN), a reinforced generator adversarial neural network that Clustering sampling that leverages conjugate prior distributions and the loss function improved with Gower coefficients and mutual information. As a result of measuring the AUC by detecting fraudulent transactions in the IEEE-CIS Fraud Detection Dataset by constructing an anomaly detector with the discriminators learned from the RGGAN proposed in this paper, it showed a performance improvement effect of 1-7% over the existing generative models, proving that the proposed model is effective for learning tabular data and also effective in detecting fraudulent transactions.

Automatic Detection of Foreign Body through Template Matching in Industrial CT Volume Data (산업용 CT 볼륨데이터에서 템플릿 매칭을 통한 이물질 자동 검출)

  • Ji, Hye-Rim;Hong, Helen
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.12
    • /
    • pp.1376-1384
    • /
    • 2013
  • In this paper, we propose an automaticdetection method of foreign bodies through template matching in industrial CT volume data. Our method is composed of three main steps. First,Indown-sampling data, the product region is separated from background after noise reduction and initial foreign-body candidates are extracted using mean and standard deviation of the product region. Then foreign-body candidates are extracted using K-means clustering. Second, the foreign body with different intensity of product region is detected using template matching. At this time, the template matching is performed by evaluating SSD orjoint entropy according to the size of detected foreign-body candidates. Third, to improve thedetection rate of foreign body in original volume data, final foreign bodiesare detected using percolation method. For the performance evaluation of our method, industrial CT volume data and simulation data are used. Then visual inspection and accuracy assessment are performed and processing time is measured. For accuracy assessment, density-based detection method is used as comparative method and Dice's coefficient is measured.

Color Code Detection and Recognition Using Image Segmentation Based on k-Means Clustering Algorithm (k-평균 클러스터링 알고리즘 기반의 영상 분할을 이용한 칼라코드 검출 및 인식)

  • Kim, Tae-Woo;Yoo, Hyeon-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.6
    • /
    • pp.1100-1105
    • /
    • 2006
  • Severe distortions of colors in the obtained images have made it difficult for color codes to expand their applications. To reduce the effect of color distortions on reading colors, it will be more desirable to statistically process as many pixels in the individual color region as possible, than relying on some regularly sampled pixels. This process may require segmentation, which usually requires edge detection. However, edges in color codes can be disconnected due tovarious distortions such as zipper effect and reflection, to name a few, making segmentation incomplete. Edge linking is also a difficult process. In this paper, a more efficient approach to reducing the effect of color distortions on reading colors, one that excludes precise edge detection for segmentation, was obtained by employing the k-means clustering algorithm. And, in detecting color codes, the properties of both six safe colors and grays were utilized. Experiments were conducted on 144, 4M-pixel, outdoor images. The proposed method resulted in a color-code detection rate of 100% fur the test images, and an average color-reading accuracy of over 99% for the detected codes, while the highest accuracy that could be achieved with an approach employing Canny edge detection was 91.28%.

  • PDF

Information Visualization Process for Spatial Big Data (공간빅데이터를 위한 정보 시각화 방법)

  • Seo, Yang Mo;Kim, Won Kyun
    • Spatial Information Research
    • /
    • v.23 no.6
    • /
    • pp.109-116
    • /
    • 2015
  • In this study, define the concept of spatial big data and special feature of spatial big data, examine information visualization methodology for increase the insight into the data. Also presented problems and solutions in the visualization process. Spatial big data is defined as a result of quantitative expansion from spatial information and qualitative expansion from big data. Characteristics of spatial big data id defined as 6V (Volume, Variety, Velocity, Value, Veracity, Visualization), As the utilization and service aspects of spatial big data at issue, visualization of spatial big data has received attention for provide insight into the spatial big data to improve the data value. Methods of information visualization is organized in a variety of ways through Matthias, Ben, information design textbook, etc, but visualization of the spatial big data will go through the process of organizing data in the target because of the vast amounts of raw data, need to extract information from data for want delivered to user. The extracted information is used efficient visual representation of the characteristic, The large amounts of data representing visually can not provide accurate information to user, need to data reduction methods such as filtering, sampling, data binning, clustering.