• Title/Summary/Keyword: 퍼지 마이닝

Search Result 34, Processing Time 0.016 seconds

lustering of Categorical Data using Rough Entropy (러프 엔트로피를 이용한 범주형 데이터의 클러스터링)

  • Park, Inkyoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.183-188
    • /
    • 2013
  • A variety of cluster analysis techniques prerequisite to cluster objects having similar characteristics in data mining. But the clustering of those algorithms have lots of difficulties in dealing with categorical data within the databases. The imprecise handling of uncertainty within categorical data in the clustering process stems from the only algebraic logic of rough set, resulting in the degradation of stability and effectiveness. This paper proposes a information-theoretic rough entropy(RE) by taking into account the dependency of attributes and proposes a technique called min-mean-mean roughness(MMMR) for selecting clustering attribute. We analyze and compare the performance of the proposed technique with K-means, fuzzy techniques and other standard deviation roughness methods based on ZOO dataset. The results verify the better performance of the proposed approach.

A Study on the Effective Selection of Tunnel Reinforcement Methods using Decision Tree Technique (의사결정트리 기법을 이용한 터널 보조공법 선정방안 연구)

  • Kim, Jong-Gyu;Sagong, Myung;Lee, Jun S.;Lee, Yong-Joo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.4C
    • /
    • pp.255-264
    • /
    • 2006
  • The auxiliary reinforcement method is normally applied to prevent a possible collapse of the tunnel face where the ground condition is not favorable or geologic information is not sufficient. Recently, several engineering approaches have been made to choose the effective reinforcement methods using expert system such as neural network and fuzzy theory field, among others. Even if the expert system has offered many decision aid tools to properly select the reinforcement method, the quantitative assessment items are not easy to estimate and this is why the data mining technique, widely used in the field of social science, medical treatment, banking and agriculture, is introduced in this study. Using decision tree together with PDA, the decision aids for reinforcement method based on field construction data are created to derive the field rules and future study will be concentrated on the application of the proposed methods in a variety of underground development cases.

Health Risk Management using Feature Extraction and Cluster Analysis considering Time Flow (시간흐름을 고려한 특징 추출과 군집 분석을 이용한 헬스 리스크 관리)

  • Kang, Ji-Soo;Chung, Kyungyong;Jung, Hoill
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.99-104
    • /
    • 2021
  • In this paper, we propose health risk management using feature extraction and cluster analysis considering time flow. The proposed method proceeds in three steps. The first is the pre-processing and feature extraction step. It collects user's lifelog using a wearable device, removes incomplete data, errors, noise, and contradictory data, and processes missing values. Then, for feature extraction, important variables are selected through principal component analysis, and data similar to the relationship between the data are classified through correlation coefficient and covariance. In order to analyze the features extracted from the lifelog, dynamic clustering is performed through the K-means algorithm in consideration of the passage of time. The new data is clustered through the similarity distance measurement method based on the increment of the sum of squared errors. Next is to extract information about the cluster by considering the passage of time. Therefore, using the health decision-making system through feature clusters, risks able to managed through factors such as physical characteristics, lifestyle habits, disease status, health care event occurrence risk, and predictability. The performance evaluation compares the proposed method using Precision, Recall, and F-measure with the fuzzy and kernel-based clustering. As a result of the evaluation, the proposed method is excellently evaluated. Therefore, through the proposed method, it is possible to accurately predict and appropriately manage the user's potential health risk by using the similarity with the patient.

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.