• Title/Summary/Keyword: data set

Search Result 10,939, Processing Time 0.039 seconds

A Study on the Improvement of Data Set Management in Government Information Systems: A Comparison with Public Data (행정정보 데이터세트 관리 개선방안 연구: 공공데이터와의 비교를 중심으로)

  • Seo, Jiin
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.20 no.4
    • /
    • pp.41-58
    • /
    • 2020
  • Although numerous studies have noted the importance of data sets in government information systems, the practical management of data sets has yet to be developed. Under these circumstances, the National Archives of Korea designated data set management as a major project in 2020, initiating full-scale management work. Despite these efforts, the records center, which will conduct management, expressed great concern for the new project. As such, this study identifies problems in managing data sets and searches for possible improvements through a comparison with existing public data projects by public institutions. In particular, the following materials were analyzed: laws, notices, guidelines, and publications issued by the ministries. Based on the results, several measures were proposed as part of an improvement plan for data set management: (1) the utilization of government functional classification as a reference, (2) the reorganization of the table, and (3) data linkage with related systems.

The diagnosis of Plasma Through RGB Data Using Rough Set Theory

  • Lim, Woo-Yup;Park, Soo-Kyong;Hong, Sang-Jeen
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2010.02a
    • /
    • pp.413-413
    • /
    • 2010
  • In semiconductor manufacturing field, all equipments have various sensors to diagnosis the situations of processes. For increasing the accuracy of diagnosis, hundreds of sensors are emplyed. As sensors provide millions of data, the process diagnosis from them are unrealistic. Besides, in some cases, the results from some data which have same conditions are different. We want to find some information, such as data and knowledge, from the data. Nowadays, fault detection and classification (FDC) has been concerned to increasing the yield. Certain faults and no-faults can be classified by various FDC tools. The uncertainty in semiconductor manufacturing, no-faulty in faulty and faulty in no-faulty, has been caused the productivity to decreased. From the uncertainty, the rough set theory is a viable approach for extraction of meaningful knowledge and making predictions. Reduction of data sets, finding hidden data patterns, and generation of decision rules contrasts other approaches such as regression analysis and neural networks. In this research, a RGB sensor was used for diagnosis plasma instead of optical emission spectroscopy (OES). RGB data has just three variables (red, green and blue), while OES data has thousands of variables. RGB data, however, is difficult to analyze by human's eyes. Same outputs in a variable show different outcomes. In other words, RGB data includes the uncertainty. In this research, by rough set theory, decision rules were generated. In decision rules, we could find the hidden data patterns from the uncertainty. RGB sensor can diagnosis the change of plasma condition as over 90% accuracy by the rough set theory. Although we only present a preliminary research result, in this paper, we will continuously develop uncertainty problem solving data mining algorithm for the application of semiconductor process diagnosis.

  • PDF

Design of the Integrated Incomplete Information Processing System based on Rough Set

  • Jeong, Gu-Beom;Chung, Hwan-Mook;Kim, Guk-Boh;Park, Kyung-Ok
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.441-447
    • /
    • 2001
  • In general, Rough Set theory is used for classification, inference, and decision analysis of incomplete data by using approximation space concepts in information system. Information system can include quantitative attribute values which have interval characteristics, or incomplete data such as multiple or unknown(missing) data. These incomplete data cause tole inconsistency in information system and decrease the classification ability in system using Rough Sets. In this paper, we present various types of incomplete data which may occur in information system and propose INcomplete information Processing System(INiPS) which converts incomplete information system into complete information system in using Rough Sets.

  • PDF

Nearest neighbor and validity-based clustering

  • Son, Seo H.;Seo, Suk T.;Kwon, Soon H.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.3
    • /
    • pp.337-340
    • /
    • 2004
  • The clustering problem can be formulated as the problem to find the number of clusters and a partition matrix from a given data set using the iterative or non-iterative algorithms. The author proposes a nearest neighbor and validity-based clustering algorithm where each data point in the data set is linked with the nearest neighbor data point to form initial clusters and then a cluster in the initial clusters is linked with the nearest neighbor cluster to form a new cluster. The linking between clusters is continued until no more linking is possible. An optimal set of clusters is identified by using the conventional cluster validity index. Experimental results on well-known data sets are provided to show the effectiveness of the proposed clustering algorithm.

Regional Geological Mapping by Principal Component Analysis of the Landsat TM Data in a Heavily Vegetated Area (식생이 무성한 지역에서의 Principal Component Analysis 에 의한 Landsat TM 자료의 광역지질도 작성)

  • 朴鍾南;徐延熙
    • Korean Journal of Remote Sensing
    • /
    • v.4 no.1
    • /
    • pp.49-60
    • /
    • 1988
  • Principal Component Analysis (PCA) was applied for regional geological mapping to a multivariate data set of the Landsat TM data in the heavily vegetated and topographically rugged Chungju area. The multivariate data set selection was made by statistical analysis based on the magnitude of regression of squares in multiple regression, and it includes R1/2/R3/4, R2/3, R5/7/R4/3, R1/2, R3/4. R4/3. AND R4/5. As a result of application of PCA, some of later principal components (in this study PC 3 and PC 5) are geologically more significant than earlier major components, PC 1 and PC 2 herein. The earlier two major components which comprise 96% of the total information of the data set, mainly represent reflectance of vegetation and topographic effects, while though the rest represent 3% of the total information which statistically indicates the information unstable, geological significance of PC3 and PC5 in the study implies that application of the technique in more favorable areas should lead to much better results.

Refinement of Ground Truth Data for X-ray Coronary Artery Angiography (CAG) using Active Contour Model

  • Dongjin Han;Youngjoon Park
    • International journal of advanced smart convergence
    • /
    • v.12 no.4
    • /
    • pp.134-141
    • /
    • 2023
  • We present a novel method aimed at refining ground truth data through regularization and modification, particularly applicable when working with the original ground truth set. Enhancing the performance of deep neural networks is achieved by applying regularization techniques to the existing ground truth data. In many machine learning tasks requiring pixel-level segmentation sets, accurately delineating objects is vital. However, it proves challenging for thin and elongated objects such as blood vessels in X-ray coronary angiography, often resulting in inconsistent generation of ground truth data. This method involves an analysis of the quality of training set pairs - comprising images and ground truth data - to automatically regulate and modify the boundaries of ground truth segmentation. Employing the active contour model and a recursive ground truth generation approach results in stable and precisely defined boundary contours. Following the regularization and adjustment of the ground truth set, there is a substantial improvement in the performance of deep neural networks.

Process modeling using artificial neural network in the presence of outliers

  • 고영철;박화규;봉복준;손주찬;왕지남
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1997.10a
    • /
    • pp.177-180
    • /
    • 1997
  • Outliers, unexpected extraordinary observations that look discordant from most observation in a data set are commonplace in various kinds of data analysis. Since the effect of outliers on model identification could be serious, the aim of this paper is to present some ways of handling outliers in given data set and to specify a model in the presence of outliers. A procedure based on neural network which identifies outliers, removes their effects, and specifies a model for the underlying process is proposed. In contrast with traditional parametric methods requiring to estimate the model's structure and parameters before detecting outliers, the proposed procedure is a nonparametric method without the estimation of model's structure and parameters before handling outliers and could be applied for real problems in the presence of outliers. The proposed methodology is performed as followings. Firstly, outliers are detected and the detected outliers replace the prediction values using outliers detection neural network. The data set removing the effect of outliers is retraining using neural network. Therefore the effects of outliers are removed and the modeling precision can be improved. Experimental results show that the proposed method is suitable for predicting data set in the presence of outliers.

  • PDF

Investigations on aerosols transport over micro- and macro-scale settings of West Africa

  • Emetere, Moses Eterigho
    • Environmental Engineering Research
    • /
    • v.22 no.1
    • /
    • pp.75-86
    • /
    • 2017
  • The aerosol content dynamics in a virtual system were investigated. The outcome was extended to monitor the mean concentration diffusion of aerosols in a predefined macro and micro scale. The data set used were wind data set from the automatic weather station; satellite data set from Total Ozone Mapping Spectrometer aerosol index and multi-angle imaging spectroradiometer; ground data set from Aerosol robotic network. The maximum speed of the macro scale (West Africa) was less than 4.4 m/s. This low speed enables the pollutants to acquire maximum range of about 15 km. The heterogeneous nature of aerosols layer in the West African atmosphere creates strange transport pattern caused by multiple refractivity. It is believed that the multiple refractive concepts inhibit aerosol optical depth data retrieval. It was also discovered that the build-up of the purported strange transport pattern with time has enormous potential to influence higher degrees of climatic change in the long term. Even when the African Easterly Jet drives the aerosols layer at about 10 m/s, the interacting layers of aerosols are compelled to mitigate its speed to about 4.2 m/s (macro scale level) and boost its speed to 30 m/s on the micro scale level. Mean concentration diffusion of aerosols was higher in the micro scale than the macro scale level. The minimum aerosol content dynamics for non-decaying, logarithmic decay and exponential decay particulates dispersion is given as 4, 1.4 and 0 respectively.

Imputation Methods for Nonresponse and Their Effect (무응답 대체 방법과 대체 효과)

  • 김규성
    • Proceedings of the Korean Association for Survey Research Conference
    • /
    • 2000.06a
    • /
    • pp.1-14
    • /
    • 2000
  • We consider statistical methods for nonresponse problem in social and economic sample surveys. To create a complete data set, which does not include item nonresponse data, imputation methods are generally used. In this paper, we introduce some imputation methods and compare them with one another. Also, we consider some problems, which occur when an imputed data set is treated as a response data set. Due to the imputed values, the true variance of the estimator after imputation is increased by the imputation variance. However, since usual naive variance estimator constructed from the imputed data set does not estimate the imputation variance, the true variance of the estimator after imputation tends to be underestimated. Theoretical reason is investigated and serious results are explained through a simulation study. Finally, some adjusted variance estimation methods to compensate for underestimation are presented and discussed.

  • PDF

Imputation Methods for Nonresponse and Their Effect (무응답 대체 방법과 대체 효과)

  • Kim, Kyu-Seong
    • Survey Research
    • /
    • v.1 no.2
    • /
    • pp.1-14
    • /
    • 2000
  • We consider statistical methods for nonresponse problem in social and economic sample surveys. To create a complete data set, which does not include item nonresponse data, imputation methods are generally used. In this paper, we introduce some imputation methods and compare them with one another. Also, we consider some problems, which occur when an imputed data set is treated as a response data set. Due to the imputed values, the true variance of the estimator after imputation is increased by the imputation variance. However, since usual naive variance estimator constructed from the imputed data set does not estimate the imputation variance, the true variance of the estimator after imputation tends to be underestimated. Theoretical reason is investigated and serious results are explained through a simulation study. Finally, some adjusted variance estimation methods to compensate for underestimation are presented and discussed.

  • PDF