• Title/Summary/Keyword: Data Quality Management Algorithm

Search Result 235, Processing Time 0.024 seconds

Automatic Algorithm for Cleaning Asset Data of Overhead Transmission Line (가공송전 전선 자산데이터의 정제 자동화 알고리즘 개발 연구)

  • Mun, Sung-Duk;Kim, Tae-Joon;Kim, Kang-Sik;Hwang, Jae-Sang
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.7 no.1
    • /
    • pp.73-77
    • /
    • 2021
  • As the big data analysis technologies has been developed worldwide, the importance of asset management for electric power facilities based data analysis is increasing. It is essential to secure quality of data that will determine the performance of the RISK evaluation algorithm for asset management. To improve reliability of asset management, asset data must be preprocessed. In particular, the process of cleaning dirty data is required, and it is also urgent to develop an algorithm to reduce time and improve accuracy for data treatment. In this paper, the result of the development of an automatic cleaning algorithm specialized in overhead transmission asset data is presented. A data cleaning algorithm was developed to enable data clean by analyzing quality and overall pattern of raw data.

Development of Healthcare Data Quality Control Algorithm Using Interactive Decision Tree: Focusing on Hypertension in Diabetes Mellitus Patients (대화식 의사결정나무를 이용한 보건의료 데이터 질 관리 알고리즘 개발: 당뇨환자의 고혈압 동반을 중심으로)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Seong-Ok;Park, Jung-Sun;Kwak, Mi-Sook;Lee, Ye-Jin;Lim, Chae-Hyeok;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • The Korean Journal of Health Service Management
    • /
    • v.10 no.3
    • /
    • pp.63-74
    • /
    • 2016
  • Objectives : There is a need to develop a data quality management algorithm to improve the quality of healthcare data using a data quality management system. In this study, we developed a data quality control algorithms associated with diseases related to hypertension in patients with diabetes mellitus. Methods : To make a data quality algorithm, we extracted the 2011 and 2012 discharge damage survey data from diabetes mellitus patients. Derived variables were created using the primary diagnosis, diagnostic unit, primary surgery and treatment, minor surgery and treatment items. Results : Significant factors in diabetes mellitus patients with hypertension were sex, age, ischemic heart disease, and diagnostic ultrasound of the heart. Depending on the decision tree results, we found four groups with extreme values for diabetes accompanying hypertension patients. Conclusions : There is a need to check the actual data contained in the Outlier (extreme value) groups to improve the quality of the data.

Developing data quality management algorithm for Hypertension Patients accompanied with Diabetes Mellitus By Data Mining (데이터 마이닝을 이용한 고혈압환자의 당뇨질환 동반에 관한 데이터 질 관리 알고리즘 개발)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Sung-Ok;Park, Jong-Son;Kwak, Mi-Sook;Lee, Ye-Jin;Im, Chae-Hyuk;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.14 no.7
    • /
    • pp.309-319
    • /
    • 2016
  • There is a need to develop a data quality management algorithm in order to improve the quality of health care data. In this study, we developed a data quality control algorithms associated diseases related to diabetes in patients with hypertension. To make a data quality algorithm, we extracted hypertension patients from 2011 and 2012 discharge damage survey data. As the result of developing Data quality management algorithm, significant factors in hypertension patients with diabetes are gender, age, Glomerular disorders in diabetes mellitus, Diabetic retinopathy, Diabetic polyneuropathy, Closed [percutaneous] [needle] biopsy of kidney. Depending on the decision tree results, we defined Outlier which was probability values associated with a patient having diabetes corporal with hypertension or more than 80%, or not more than 20%, and found six groups with extreme values for diabetes accompanying hypertension patients. Thus there is a need to check the actual data contained in the Outlier(extreme value) groups to improve the quality of the data.

Data Quality Management: Operators and a Matching Algorithm with a CRM Example (데이터 품질 관리 : CRM을 사례로 연산자와 매칭기법 중심)

  • 심준호
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.3
    • /
    • pp.117-130
    • /
    • 2003
  • It is not unusual to observe that there Is a great amount of redundant or inconsistent data even within an e-business system such as CRM(Customer Relationship Management) system. This problem becomes aggravate when we construct a system of which information are gathered from different sources. Data quality management is indeed needed to avoid any possible redundant or inconsistent data in such information system. A data quality process, in general, consists of three phases: data cleaning (scrubbing), matching, and integration phase. In this paper, we introduce and categorize data quality operators for each phase. Then, we describe our distance function used in the matching phase, and present a matching algorithm PRIMAL (a PRactical Matching Algorithm). And finally, we present a related work and future research.

  • PDF

A Study for Efficient EM Algorithms for Estimation of the Proportion of a Mixed Distribution (분포 혼합비율의 모수추정을 위한 효율적인 알고리즘에 관한 연구)

  • 황강진;박경탁;유희경
    • Journal of Korean Society for Quality Management
    • /
    • v.30 no.4
    • /
    • pp.68-77
    • /
    • 2002
  • EM algorithm has good convergence rate for numerical procedures which converges on very small step. In the case of proportion estimation in a mixed distribution which has very big incomplete data or of update of new data continuously, however, EM algorithm highly depends on a initial value with slow convergence ratio. There have been many studies to improve the convergence rate of EM algorithm in estimating the proportion parameter of a mixed data. Among them, dynamic EM algorithm by Hurray Jorgensen and Titterington algorithm by D. M. Titterington are proven to have better convergence rate than the standard EM algorithm, when a new data is continuously updated. In this paper we suggest dynamic EM algorithm and Titterington algorithm for the estimation of a mixed Poisson distribution and compare them in terms of convergence rate by using a simulation method.

Adaptive Buffer Management Method for Quality of Service of Internet Telephony (인터넷폰의 QoS를 위한 적응적인 버퍼관리 방식)

  • 류태욱;이정훈;강성호;엄기환
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.3
    • /
    • pp.386-392
    • /
    • 2002
  • Internet telephony is an application that transmits voice data for conversation. Therefore it must provide high sound quality. However while audio packets are transferred through the network, they are affected by delay variations and jitters, which could result in poor sound quality of the receiving end does not have an appropriate jitter buffer to overcome network factors. This thesis introduces a buffer management algorithm that could be used to provide better sound quality for Internet phone terminals. This algorithm actively responds to both the compression algorithms that are used by the terminals, as well as to the received data to provide an improvement in sound quality. In order to verify the effectiveness of the proposed algorithm, we experimented in variance network settings. The results show that the proposed algorithm improves on the performance of the conventional buffer management algorithm.

A symbiotic evolutionary algorithm for the clustering problems with an unknown number of clusters (클러스터 수가 주어지지 않는 클러스터링 문제를 위한 공생 진화알고리즘)

  • Shin, Kyoung-Seok;Kim, Jae-Yun
    • Journal of Korean Society for Quality Management
    • /
    • v.39 no.1
    • /
    • pp.98-108
    • /
    • 2011
  • Clustering is an useful method to classify objects into subsets that have some meaning in the context of a particular problem and has been applied in variety of fields, customer relationship management, data mining, pattern recognition, and biotechnology etc. This paper addresses the unknown K clustering problems and presents a new approach based on a coevolutionary algorithm to solve it. Coevolutionary algorithms are known as very efficient tools to solve the integrated optimization problems with high degree of complexity compared to classical ones. The problem considered in this paper can be divided into two sub-problems; finding the number of clusters and classifying the data into these clusters. To apply to coevolutionary algorithm, the framework of algorithm and genetic elements suitable for the sub-problems are proposed. Also, a neighborhood-based evolutionary strategy is employed to maintain the population diversity. To analyze the proposed algorithm, the experiments are performed with various test-bed problems which are grouped into several classes. The experimental results confirm the effectiveness of the proposed algorithm.

Laplace-Metropolis Algorithm for Variable Selection in Multinomial Logit Model (Laplace-Metropolis알고리즘에 의한 다항로짓모형의 변수선택에 관한 연구)

  • 김혜중;이애경
    • Journal of Korean Society for Quality Management
    • /
    • v.29 no.1
    • /
    • pp.11-23
    • /
    • 2001
  • This paper is concerned with suggesting a Bayesian method for variable selection in multinomial logit model. It is based upon an optimal rule suggested by use of Bayes rule which minimizes a risk induced by selecting the multinomial logit model. The rule is to find a subset of variables that maximizes the marginal likelihood of the model. We also propose a Laplace-Metropolis algorithm intended to suggest a simple method forestimating the marginal likelihood of the model. Based upon two examples, artificial data and empirical data examples, the Bayesian method is illustrated and its efficiency is examined.

  • PDF

Adaptive Buffer Management Method for QoS of Internet Telephony (인터넷폰의 QoS를 위한 적응적인 버퍼관리 방식)

  • 류태욱;이현관;이용구;김주웅;엄기환
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.05a
    • /
    • pp.384-387
    • /
    • 2002
  • Internet telephony is an application that transmits voice data for conversation. Therefore it must provide high sound quality. However while audio packets are transferred through the network, they are affected by delay variations and jitters, which could result in poor sound quality if the receiving end does not have an appropriate jitter buffer to overcome network factors. This thesis introduces a buffer management algorithm that could be used to provide better sound quality for Internet phone terminals. This algorithm actively responds to both the compression algorithms that are used by the terminals, as well as to the received data to provide an improvement in sound quality. In order to confirm the validity of the suggested algorithm, comparisons of the performance have been made between the existing buffer management algorithms and this new algorithm in various network settings.

  • PDF

A Prototyping Framework of the Documentation Retrieval System for Enhancing Software Development Quality

  • Chang, Wen-Kui;Wang, Tzu-Po
    • International Journal of Quality Innovation
    • /
    • v.2 no.2
    • /
    • pp.93-100
    • /
    • 2001
  • This paper illustrates a prototyping framework of the documentation-standards retrieval system via the data mining approach for enhancing software development quality. We first present an approach for designing a retrieval algorithm based on data mining, with the three basic technologies of machine learning, statistics and database management, applied to this system to speed up the searching time and increase the fitness. This approach derives from the observation that data mining can discover unsuspected relationships among elements in large databases. This observation suggests that data mining can be used to elicit new knowledge about the design of a subject system and that it can be applied to large legacy systems for efficiency. Finally, software development quality will be improved at the same time when the project managers retrieving for the documentation standards.

  • PDF