• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.031 seconds

An Incremental Web Document Clustering Based on the Transitive Closure Tree (이행적 폐쇄트리를 기반으로 한 점증적 웹 문서 클러스터링)

  • Youn Sung-Dae;Ko Suc-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • In document clustering methods, the k-means algorithm and the Hierarchical Alglomerative Clustering(HAC) are often used. The k-means algorithm has the advantage of a processing time and HAC has also the advantage of a precision of classification. But both methods have mutual drawbacks, a slow processing time and a low quality of classification for the k-means algorithm and the HAC, respectively. Also both methods have the serious problem which is to compute a document similarity whenever new document is inserted into a cluster. A main property of web resource is to accumulate an information by adding new documents frequently. Therefore, we propose a new method of transitive closure tree based on the HAC method which can improve a processing time for a document clustering, and also propose a superior incremental clustering method for an insertion of a new document and a deletion of a document contained in a cluster. The proposed method is compared with those existing algorithms on the basis of a pre챠sion, a recall, a F-Measure, and a processing time and we present the experimental results.

  • PDF

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI (LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법)

  • Yoo, Han-mook;Kim, Han-joon;Chang, Jae-young
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1236-1243
    • /
    • 2017
  • In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.

Determining Factors of Intention to Actual Use of Charged Long-term Care Services for the Aged (유료노인장기요양보호서비스 이용의사 결정요인)

  • Yoo, Jin-Yeong;Chun, Jin-Ho
    • Journal of Preventive Medicine and Public Health
    • /
    • v.38 no.1
    • /
    • pp.16-24
    • /
    • 2005
  • Objectives : To help develop strategies to cope with the changes arising from the rapid aging process by predicting the determining factors of intention to actual use of the charged long-term care services for elderly as perceived by the middle aged who play the major role of supports. Methods : Subjects were the parents (men 177, women 507) in their 40s of the students selected from a university of Busan city. A questionnaire survey was conducted for 4 weeks in October 2003 about the knowledge for long-term care service, the intention of actual use, and the preferences about the type of service suppliers. Data analysis was performed with frequency, chi-square test, and t-test using SPSS program (ver 10.0K), along with data mining using decision tree of Enterprise Miner V8.2 by SAS. Results : About half of the subjects (53.7%) had the actual experiences of elderly supports. Intentions to use the charged services were relatively high in home visiting nursing care service (40.1%) and long-term care facilities service (40.4%), and were influenced by previous knowledge about the services. The intentions were stronger in women, those with higher education, and those with greater income levels. Actual elderly supports were mostly (80%) done by women, and the perceived burdens for the supports were bigger in women and those of lower socioeconomic level. Desired charges were about 10,000 won for the bath service, 20,000 won for the rests services per day, and about 500,000 won for the long-term care facilities service per month. From the result of decision tree analysis, the job professionalism was the most important determining factor of intention to actual use of the services with validation as $63{\sim}71%$. Health and welfare mixed type facilities were preferred, and the most important consideration was the level of professionalism. Conclusions : Intention to actual use of the charged services was largely determined by the aspects of time and cost. Polices to increase the number of service suppliers and to decrease the burdens perceived by actual supporters were strongly recommended.

Research of Semantic Considered Tree Mining Method for an Intelligent Knowledge-Services Platform

  • Paik, Juryon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.27-36
    • /
    • 2020
  • In this paper, we propose a method to derive valuable but hidden infromation from the data which is the core foundation in the 4th Industrial Revolution to pursue knowledge-based service fusion. The hyper-connected societies characterized by IoT inevitably produce big data, and with the data in order to derive optimal services for trouble situations it is first processed by discovering valuable information. A data-centric IoT platform is a platform to collect, store, manage, and integrate the data from variable devices, which is actually a type of middleware platforms. Its purpose is to provide suitable solutions for challenged problems after processing and analyzing the data, that depends on efficient and accurate algorithms performing the work of data analysis. To this end, we propose specially designed structures to store IoT data without losing the semantics and provide algorithms to discover the useful information with several definitions and proofs to show the soundness.

Security tendency analysis techniques through machine learning algorithms applications in big data environments (빅데이터 환경에서 기계학습 알고리즘 응용을 통한 보안 성향 분석 기법)

  • Choi, Do-Hyeon;Park, Jung-Oh
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.269-276
    • /
    • 2015
  • Recently, with the activation of the industry related to the big data, the global security companies have expanded their scopes from structured to unstructured data for the intelligent security threat monitoring and prevention, and they show the trend to utilize the technique of user's tendency analysis for security prevention. This is because the information scope that can be deducted from the existing structured data(Quantify existing available data) analysis is limited. This study is to utilize the analysis of security tendency(Items classified purpose distinction, positive, negative judgment, key analysis of keyword relevance) applying the machine learning algorithm($Na{\ddot{i}}ve$ Bayes, Decision Tree, K-nearest neighbor, Apriori) in the big data environment. Upon the capability analysis, it was confirmed that the security items and specific indexes for the decision of security tendency could be extracted from structured and unstructured data.

A Study on Sensor Data Analysis and Product Defect Improvement for Smart Factory (스마트 팩토리를 위한 센서 데이터 분석과 제품 불량 개선 연구)

  • Hwang, Sewong;Kim, Jonghyuk;Hwangbo, Hyunwoo
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.95-103
    • /
    • 2018
  • In recent years, many people in the manufacturing field have been making efforts to increase efficiency while analyzing manufacturing data generated in the process according to the development of ICT technology. In this study, we propose a data mining based manufacturing process using decision tree algorithm (CHAID) as part of a smart factory. We used 432 sensor data from actual manufacturing plant collected for about 5 months to find out the variables that show a significant difference between the stable process period with low defect rate and the unstable process period with high defect rate. We set the range of the stable value of the variable to determine whether the selected final variable actually has an effect on the defect rate improvement. In addition, we measured the effect of the defect rate improvement by adjusting the process set-point so that the sensor did not deviate from the stable value range in the 14 day process. Through this, we expect to be able to provide empirical guidelines to improve the defect rate by utilizing and analyzing the process sensor data generated in the manufacturing industry.

Pattern Classification Model Design and Performance Comparison for Data Mining of Time Series Data (시계열 자료의 데이터마이닝을 위한 패턴분류 모델설계 및 성능비교)

  • Lee, Soo-Yong;Lee, Kyoung-Joung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.730-736
    • /
    • 2011
  • In this paper, we designed the models for pattern classification which can reflect the latest trend in time series. It has been shown that fusion models based on statistical and AI methods are superior to traditional ones for the pattern classification model supporting decision making. Especially, the hit rates of pattern classification models combined with fuzzy theory are relatively increased. The statistical SVM models combined with fuzzy membership function, or the models combining neural network and FCM has shown good performance. BPN, PNN, FNN, FCM, SVM, FSVM, Decision Tree, Time Series Analysis, and Regression Analysis were used for pattern classification models in the experiments of this paper. The economical indices DB with time series properties of the financial market(Korea, KOSPI200 DB) and the electrocardiogram DB of arrhythmia patients in hospital emergencies(USA, MIT-BIH DB) were used for data base.

Inflow and outflow analysis of double majors using social network analysis (사회 연결망 분석을 이용한 복수전공 유입 및 유출 분석)

  • Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.693-701
    • /
    • 2012
  • Recently, the number of students who get double majors has tended to increase in many universities. As results, many problems occur because immoderate inflow of double-major students is concentrated in a specific popular department. In this paper, we study the characteristic of inflow and outflow of double majors using social network analysis and decision tree analysis. According to the results, SAT score affected the inflow of double majors the most. Additionally, department category, course evaluation score, employment rate also affected the inflow of double majors in the order named. On the other hand, department category affected the outflow of double majors the most. Additionally, SAT score, employment rate, course evaluation score also affected the outflow of double majors in the order named.

Analysis for Changes of Mode Choice Behavior from Providing Real-time Schedule for Public Transportation by Smartphone Application (스마트폰 애플리케이션을 이용한 대중교통 운행정보 제공에 따른 통행자 수단선택 행태변화 분석)

  • Choi, Sung-Taek;Rho, Jeong-Hyun
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.6
    • /
    • pp.60-69
    • /
    • 2012
  • Public Transport Information Service which use smartphone Apps has received attention as the way of solution that reduced transport problem. Smartphone can offer real-time information because of a LBS(Location Based Service) system. This study try to find out which factor affect mode choice ratio of public transport, especially smartphone Apps. The result shows that rising oil price, traffic congestion, public information service with smartphone apps, BIS(Bus Information System) factors get 0.39, 0.27, 0.18, 0.16 scores with paired comparison. Younger and student respondents prefer smart phone public information service. Decision Tree shows that the most important decision factor is smartphone information service factor.

A Study on the Big Data Analysis and Predictive Models for Quality Issues in Defense C5ISR (국방 C5ISR 분야 품질문제의 빅데이터 분석 및 예측 모델에 대한 연구)

  • Hyoung Jo Huh;Sujin Ko;Seung Hyun Baek
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.4
    • /
    • pp.551-571
    • /
    • 2023
  • Purpose: The purpose of this study is to propose useful suggestions by analyzing the causal effect relationship between the failure rate of quality and the process variables in the C5ISR domain of the defense industry. Methods: The collected data through the in house Systems were analyzed using Big data analysis. Data analysis between quality data and A/S history data was conducted using the CRISP-DM(Cross-Industry Standard Process for Data Mining) analysis process. Results: The results of this study are as follows: After evaluating the performance of candidate models for the influence of inspection data and A/S history data, logistic regression was selected as the final model because it performed relatively well compared to the decision tree with an accuracy of 82%/67% and an AUC of 0.66/0.57. Based on this model, we estimated the coefficients using 'R', a data analysis tool, and found that a specific variable(continuous maximum discharge current time) had a statistically significant effect on the A/S quality failure rate and it was analysed that 82% of the failure rate could be predicted. Conclusion: As the first case of applying big data analysis to quality issues in the defense industry, this study confirms that it is possible to improve the market failure rates of defense products by focusing on the measured values of the main causes of failures derived through the big data analysis process, and identifies improvements, such as the number of data samples and data collection limitations, to be addressed in subsequent studies for a more reliable analysis model.