• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.025 seconds

Development of Semantic-Based XML Mining for Intelligent Knowledge Services (지능형 지식서비스를 위한 의미기반 XML 마이닝 시스템 연구)

  • Paik, Juryon;Kim, Jinyeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.59-62
    • /
    • 2018
  • XML을 대상으로 하는 연구가 최근 5~6년 사이에 꾸준한 증가를 보이며 이루어지고 있지만 대다수의 연구들은 XML을 구성하고 있는 엘리먼트 자체에 대한 통계적인 모델을 기반으로 이루어졌다. 이는 XML의 고유 속성인 트리 구조에서의 텍스트, 문장, 문장 구성 성분이 가지고 있는 의미(semantics)가 명시적으로 분석, 표현되어 사용되기 보다는 통계적인 방법으로만 데이터의 발생을 계산하여 사용자가 요구한 질의에 대한 결과, 즉 해당하는 정보 및 지식을 제공하는 형식이다. 지능형 지식서비스 제공을 위한 환경에 부합하기 위한 정보 추출은, 텍스트 및 문장의 구성 요소를 분석하여 문서의 내용을 단순한 단어 집합보다는 풍부한 의미를 내포하는 형식으로 표현함으로써 보다 정교한 지식과 정보의 추출이 수행될 수 있도록 하여야 한다. 본 연구는 범람하는 XML 데이터로부터 사용자 요구의 의미까지 파악하여 정확하고 다양한 지식을 추출할 수 있는 방법을 연구하고자 한다. 레코드 구조가 아닌 트리 구조 데이터로부터 의미 추출이 가능한 효율적인 마이닝 기법을 진일보시킴으로써 다양한 사용자 중심의 서비스 제공을 최종 목적으로 한다.

  • PDF

Sequential Pattern Mining for Customer Retention in Insurance Industry (보험 고객의 유지를 위한 순차 패턴 마이닝)

  • Lee, Jae-Sik;Jo, Yu-Jeong
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.274-282
    • /
    • 2005
  • Customer retention is one of the major issued in life insurance industry, in which competition is increasingly fierce. There are many things to do to retain customers. One of those things is to be continuously in touch with all customers. The objective of this study is to design the contact scheduling system(CSS) to support the planers who must touch the customers without having subjective information. Support-planers suffer from lack of information which can be used to intimately touch. CSS that is developed in this study generates contact schedule to touch customers by taking into account existing contact history. CSS has a two stage process. In the first stage, it segments customers according to his or her demographics and contract status data. Then it finds typical pattern and pattern is combined to business rules for each segment. We expert that CSS would support support-planers to make uncontacted customers' experience positive.

  • PDF

Decision-Tree Model of Long-term Abstention from Smoking: Focused on Coping Styles (장기적 금연 지속기간 예측 모형: 스트레스 대처를 중심으로)

  • Suh, Kyung-Hyun;You, Jae-Min
    • Korean Journal of Health Education and Promotion
    • /
    • v.22 no.4
    • /
    • pp.73-90
    • /
    • 2005
  • Objectives: Smokers who had failed to quit smoking were frequently reported that life stress mostly interrupted their abstention. Stress vulnerability model for smoking cessation has been considered, and most of contemporary smoking cessation programs help smokers develop coping strategies for stressful situations. This study aims to investigate the appropriate coping styles for stress of abstention from smoking. The result of investigating the relationship between abstention following smoking cessation program and coping styles would suggest useful information for those who want to stop smoking and health practitioners who help them. Methods: Participants were 69 smokers (62 males, 7 females) participated in a hospitalized smoking cessation program, whose mean age was 44.89 (SD=9.61). Participants took medical test and completed questionnaires and psychological tests including: Fagerstrom Test for Nicotine Dependence and Multidimensional Coping Scale. To identify participants' abstention, researchers followed them for 2 years. To identify whether abstained or not and encourage them to abstain, researchers called them on the telephone once a week for 3 months. After 3 months, they were contacted every other week till 6 months passed since they left smoking cessation program. And they were contacted once a month for other 18months. Researchers also contacted their family to identify their abstention. Data Mining Decision Tree was performed with 37 variables (13 variables for the coping styles and 24 smoking-related variables) by Answer Tree 3.0v Results: Forty four (63.8%) out of sixty nine for 2 weeks, 34 (49.3%) for 6 months, 25 (36.2%) abstained for 1 year, and 22 (31.9%) abstained for 2 years. Participants of this study abstained average of 286.77 days from smoking. Included variables of a Decision Tree model for this study were positive interpretation, emotional expression, self-criticism, restraint and emotional social support seeking. Decision Tree model showed that those (n=9) who did not interpret positively (<=7.5) and criticized themselves (>6.5) abstained 23 days only, while those (n=9) who interpreted positively (>7.5), expressed their emotion freely (>6.5), and sought social support actively (>11.5) abstained 730 days, till last day of the investigation. Conclusion: The results of this study showed that certain coping styles such as positive interpretation, emotional expression, self-criticism, restraint and emotional social support seeking were important factors for long-term abstention from smoking. These findings reiterate the role of stress for abstention from smoking and suggest a model of coping styles for successful abstention from smoking. Despite of limitation of this study, it might help smokers who want to stop smoking and health practitioners who help them.

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

Design of Contact Scheduling System(CSS) for Customer Retention (고객유지를 위한 접촉스케줄링시스템의 설계)

  • Lee, Jee-Sik;Cho, You-Jung
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.3
    • /
    • pp.83-101
    • /
    • 2005
  • Customer retention is one of the major issues in life insurance industry, in which competition is increasingly fierce. There are many things for the life insurers to do many things to retain the customers. One of those things is to make sure to keep in touch with all customers. When an insurance-planner resigned, his/her customers must be taken care of by some planner-assistants. This article outlines the design of Contact Scheduling System (CSS) that supports planner-assistants for contacting the customers. Planner-assistants are unable to share the resigned insurance-planner's experience and knowledge regarding the customer relationship management. The CSS developed by employing both Classification And Regression Tree (CART) technique and Sequential Pattern Mining (SPM) technique has a two-stage process. In the first stage, it segments the customers into eight groups by CART model. Then it generates contact scheduling information consisting of contact-purpose, contact-interval and contact-channel, according to the segment's typical contact pattern. Contact-purpose is derived by schedule-driven, event-driven, or business-rule-driven. Schedule-driven contact is determined by SPM model. In the operation of CSS in a realistic situation, it shows a practicality in supporting planner-assistants to keep in touch with the customers efficiently and effectively.

  • PDF

Intelligent Production Management System with the Enhanced PathTree (개선된 패스트리를 이용한 지능형 생산관리 시스템)

  • Kwon, Kyung-Lag;Ryu, Jae-Hwan;Sohn, Jong-Soo;Chung, In-Jeong
    • The KIPS Transactions:PartD
    • /
    • v.16D no.4
    • /
    • pp.621-630
    • /
    • 2009
  • In recent years, there have been many attempts to connect the latest RFID (Radio Frequency Identification) technology with EIS (Enterprise Information System) and utilize them. However, in most cases the focus is only on the simultaneous multiple reading capability of the RFID technology neglecting the management of massive data created from the reader. As a result, it is difficult to obtain time-related information such as flow prediction and analysis in process control. In this paper, we suggest a new method called 'procedure tree', an enhanced and complementary version of PathTree which is one of RFID data mining techniques, to manage massive RFID data sets effectively and to perform a real-time process control efficiently. We will evaluate efficiency of the proposed system after applying real-time process management system connected with the RFID-based EIS. Through the suggested method, we are able to perform such tasks as prediction or tracking of process flow for real-time process control and inventory management efficiently which the existing RFID-based production system could not have done.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

A Study of Combined Splitting Rules in Regression Trees

  • Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.1
    • /
    • pp.97-104
    • /
    • 2002
  • Regression trees, a technique in data mining, are constructed by splitting function-a independent variable and its threshold. Lee (2002) considered one-sided purity (OSP) and one-sided extreme (OSE) splitting criteria for finding a interesting node as early as possible. But these methods cannot be crossed each other in the same tree. They are just concentrated on OSP or OSE separately in advance. In this paper, a new splitting method, which is the combination and extension of OSP and OSE, is proposed. By these combined criteria, we can select the nodes by considering both pure and extreme in the same tree. These criteria are not the generalized one of the previous criteria but another option depending on the circumstance.

  • PDF

A Combinatorial Optimization for Influential Factor Analysis: a Case Study of Political Preference in Korea

  • Yun, Sung Bum;Yoon, Sanghyun;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.5
    • /
    • pp.415-422
    • /
    • 2017
  • Finding influential factors from given clustering result is a typical data science problem. Genetic Algorithm based method is proposed to derive influential factors and its performance is compared with two conventional methods, Classification and Regression Tree (CART) and Chi-Squared Automatic Interaction Detection (CHAID), by using Dunn's index measure. To extract the influential factors of preference towards political parties in South Korea, the vote result of $18^{th}$ presidential election and 'Demographic', 'Health and Welfare', 'Economic' and 'Business' related data were used. Based on the analysis, reverse engineering was implemented. Implementation of reverse engineering based approach for influential factor analysis can provide new set of influential variables which can present new insight towards the data mining field.

Evaluation of Ultrasound for Prediction of Carcass Meat Yield and Meat Quality in Korean Native Cattle (Hanwoo)

  • Song, Y.H.;Kim, S.J.;Lee, S.K.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.15 no.4
    • /
    • pp.591-595
    • /
    • 2002
  • Three hundred thirty five progeny testing steers of Korean beef cattle were evaluated ultrasonically for back fat thickness (BFT), longissimus muscle area (LMA) and intramuscular fat (IF) before slaughter. Class measurements associated with the Korean yield grade and quality grade were also obtained. Residual standard deviation between ultrasonic estimates and carcass measurements of BFT, LMA were 1.49 mm and $0.96cm^2$. The linear correlation coefficients (p<0.01) between ultrasonic estimates and carcass measurements of BFT, LMA and IF were 0.75, 0.57 and 0.67, respectively. Results for improving predictions of yield grade by four methods-the Korean yield grade index equation, fat depth alone, regression and decision tree methods were 75.4%, 79.6%, 64.3% and 81.4%, respectively. We conclude that the decision tree method can easily predict yield grade and is also useful for increasing prediction accuracy rate.