• 제목/요약/키워드: data mining processes

검색결과 141건 처리시간 0.029초

Overview of Fuzzy Associations Mining

  • Chen, Guoqing;Wei, Qiang;Kerre, Etienne;Wets, Geert
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 ISIS 2003
    • /
    • pp.1-6
    • /
    • 2003
  • Associations, as specific forms of knowledge, reflect relationships among items in databases, and have been widely studied in the fields of knowledge discovery and data mining. Recent years have witnessed many efforts on discovering fuzzy associations, aimed at coping with fuzziness in knowledge representation and decision support processes. This paper focuses on associations of three kinds, namely, association rules, functional dependencies and pattern associations, and overviews major fuzzy logic extensions accordingly.

  • PDF

데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축 (An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques)

  • 박선아
    • 종양간호연구
    • /
    • 제5권1호
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF

데이터마이닝 기법을 적용한 취수원 수질예측모형 평가 (Evaluation of Water Quality Prediction Models at Intake Station by Data Mining Techniques)

  • 김주환;채수권;김병식
    • 환경영향평가
    • /
    • 제20권5호
    • /
    • pp.705-716
    • /
    • 2011
  • For the efficient discovery of knowledge and information from the observed systems, data mining techniques can be an useful tool for the prediction of water quality at intake station in rivers. Deterioration of water quality can be caused at intake station in dry season due to insufficient flow. This demands additional outflow from dam since some extent of deterioration can be attenuated by dam reservoir operation to control outflow considering predicted water quality. A seasonal occurrence of high ammonia nitrogen ($NH_3$-N) concentrations has hampered chemical treatment processes of a water plant in Geum river. Monthly flow allocation from upstream dam is important for downstream $NH_3$-N control. In this study, prediction models of water quality based on multiple regression (MR), artificial neural network and data mining methods were developed to understand water quality variation and to support dam operations through providing predicted $NH_3$-N concentrations at intake station. The models were calibrated with eight years of monthly data and verified with another two years of independent data. In those models, the $NH_3$-N concentration for next time step is dependent on dam outflow, river water quality such as alkalinity, temperature, and $NH_3$-N of previous time step. The model performances are compared and evaluated by error analysis and statistical characteristics like correlation and determination coefficients between the observed and the predicted water quality. It is expected that these data mining techniques can present more efficient data-driven tools in modelling stage and it is found that those models can be applied well to predict water quality in stream river systems.

Learning Graphical Models for DNA Chip Data Mining

  • Zhang, Byoung-Tak
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.59-60
    • /
    • 2000
  • The past few years have seen a dramatic increase in gene expression data on the basis of DNA microarrays or DNA chips. Going beyond a generic view on the genome, microarray data are able to distinguish between gene populations in different tissues of the same organism and in different states of cells belonging to the same tissue. This affords a cell-wide view of the metabolic and regulatory processes under different conditions, building an effective basis for new diagnoses and therapies of diseases. In this talk we present machine learning techniques for effective mining of DNA microarray data. A brief introduction to the research field of machine learning from the computer science and artificial intelligence point of view is followed by a review of recently-developed learning algorithms applied to the analysis of DNA chip gene expression data. Emphasis is put on graphical models, such as Bayesian networks, latent variable models, and generative topographic mapping. Finally, we report on our own results of applying these learning methods to two important problems: the identification of cell cycle-regulated genes and the discovery of cancer classes by gene expression monitoring. The data sets are provided by the competition CAMDA-2000, the Critical Assessment of Techniques for Microarray Data Mining.

  • PDF

비즈니스 프로세스 수행자들의 Social Network Mining에 대한 연구 (Mining Social Networks from business process log)

  • 송민석;;최인준
    • 한국경영과학회:학술대회논문집
    • /
    • 대한산업공학회/한국경영과학회 2004년도 춘계공동학술대회 논문집
    • /
    • pp.544-547
    • /
    • 2004
  • Current increasingly information systems log historic information in a systematic way. Not only workflow management systems, but also ERP, CRM, SCM, and B2B systems often provide a so-called 'event log'. Unfortunately, the information in these event logs is rarely used to analyze the underlying processes. Process mining aims at improving this problem by providing techniques and tools for discovering process, control, data, organizational, and social structures from event logs. This paper focuses on the mining social networks. This is possible because event logs typically record information about the users executing the activities recorded in the log. To do this we combine concepts from workflow management and social network analysis. This paper introduces the approach and presents a tool to mine social networks from event logs.

  • PDF

Knowledge Extractions, Visualizations, and Inference from the big Data in Healthcare and Medical

  • Kim, Jin Sung
    • 한국지능시스템학회논문지
    • /
    • 제23권5호
    • /
    • pp.400-405
    • /
    • 2013
  • The purpose of this study is to develop a composite platform for knowledge extractions, visualizations, and inference. Generally, the big data sets were frequently used in the healthcare and medical area. To help the knowledge managers/users working in the field, this study is focused on knowledge management (KM) based on Data Mining (DM), Knowledge Distribution Map (KDM), Decision Tree (DT), RDBMS, and SQL-inference. The proposed mechanism is composed of five key processes. Firstly, in Knowledge Parsing, it extracts logical rules from a big data set by using DM technology. Then it transforms the rules into RDB tables. Secondly, through Knowledge Maintenance, it refines and manages the knowledge to be ready for the computing of knowledge distributions. Thirdly, in Knowledge Distribution process, we can see the knowledge distributions by using the DT mechanism.Fourthly, in Knowledge Hierarchy, the platform shows the hierarchy of the knowledge. Finally, in Inference, it deduce the conclusions by using the given facts and data.This approach presents the advantages of diversity in knowledge representations and inference to improve the quality of computer-based medical diagnosis.

변형된 FP-트리 기반의 적응형 비즈니스 프로세스 마이닝 알고리즘 (An Adaptive Business Process Mining Algorithm based on Modified FP-Tree)

  • 김건우;이승훈;김재형;서혜명;손진현
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권3호
    • /
    • pp.301-315
    • /
    • 2010
  • 기업 간의 경쟁이 심화되고 새로운 비즈니스 가치 창출을 위한 필요성이 증대되고 있는 상황에서, 기업들은 비즈니스 프로세스 관리 기술에 많은 관심을 기울이고 있다. 하지만 비즈니스 분석가와 시스템 개발자간의 이해 정도 및 의견 불일치 등으로 인하여 프로세스가 의도한대로 실행되지 않거나 효율이 떨어지는 프로세스 등이 설계될 수 있다. 이러한 문제점을 해결하기 위하여 비즈니스 프로세스 재설계의 근거로 사용될 수 있는 비즈니스 프로세스 마이닝이 중요한 개념으로 인식되고 있다. 하지만 기존의 프로세스 마이닝에 관한 연구에서는 완성되어 있는 프로세스 로그를 기반으로 워크플로우 기반의 프로세스 모델을 추출하는 단조로운 형태였기 때문에 다양한 형태의 비즈니스 프로세스를 표현하는데 한계가 있었으며, 새로운 프로세스 로그가 추가될 때마다 로그 정보들을 재 스캔해야함으로 프로세스 검출 및 로그정보 탐색시간이 느려지는 단점이 존재하였다. 본 논문에서는 데이터 마이닝의 연관성 분석에 사용되는 FP-트라를 비즈니스 프로세스에 적합하게 변형하여 추가되는 대량의 프로세스 로그 정보를 재 스캔과정 없이 사용자가 원하는 수준의 프로세스 모델을 검출하도록 지원하는 변형된 FP-트리 기반의 프로세스 마이닝 알고리즘을 제시하고자 한다.

워크플로우 프로세스 기반 데이터 큐브 및 분석 (Workflow Process-Aware Data Cubes and Analysis)

  • 진민혁;김광훈
    • 인터넷정보학회논문지
    • /
    • 제19권6호
    • /
    • pp.83-89
    • /
    • 2018
  • 워크플로우 프로세스 인텔리전스와 시스템에서 워크플로우 프로세스 마이닝 및 분석 문제가 중요해지고 있다. 워크플로우 프로세스 인텔리전스의 품질을 향상시키기 위해서는 워크플로우 프로세스 마이닝 및 분석을 수행할 때, 워크플로우 실행 이벤트 로그를 저장하는 효율적이고 효과적인 데이터 센터가 필수적이다. 본 논문에서는 워크플로우 이벤트 로그 데이터 센터를 효율적으로 구성하고 XES 형식으로 워크플로우 프로세스 실행 이벤트 로그를 효과적으로 저장하기 위한 3차원 프로세스 기반 데이터 큐브를 제안한다. 이의 검증 단계로서, 프로세스 기반 데이터 큐브가 워크플로우 프로세스 패턴과 해당 워크플로우 프로세스 실행 이벤트 내역에서 실행 비율 및 업무전달관계와 같은 분석적 지식을 발견하는데 얼마나 적합한지를 보여주기 위해 프로세스 마이닝 실행 예제를 제시한다. 결과적으로, 프로세스 기반 데이터 큐브와 이를 활용한 프로세스 마이닝 시스템의 구현을 통해, 워크플로우 프로세스의 기본적 제어흐름 패턴을 성공적으로 발견할 수 있음을 확인했다.

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Corporate Social Responsibility Regulation in the Indonesian Mining Companies

  • NUSWANTARA, Dian Anita;PRAMESTI, Dhea Ayu
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권10호
    • /
    • pp.161-169
    • /
    • 2020
  • The condition of mining companies that exploit natural resources in their business processes underline this research to emphasize on social and environmental issues. After twelve years of government regulation on CSR practices, this study investigates the factors that influence mining companies in disclosing information about corporate social responsibility based on legitimacy, stakeholders, and agency theory. Thus, independent variables are foreign ownership, company size, leverage, and the board of commissioners. The dependent variable is the corporate social reporting disclosure that is measured using GRI indexing. For sampling, we have used thirty-four Indonesian mining companies listed in IDX during the 2014-2018. out of which only fifty-two companies meet the sample criteria. All data should pass the classical assumption test to get the best estimator. Multiple linear regression is used to test the hypothesis, and the results show that the model is good, and can explain 60% of the dependent variable. Based on F-test, all four variables affect CSR practices simultaneously. The findings of this study suggest that foreign ownership and firm size influences CSR disclosure in a positive direction. However, this study did not support the hypothesis that leverage negatively affects CSR disclosure and board size measures positively affect CSR disclosure.