• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.027 seconds

Exploring the Feature Selection Method for Effective Opinion Mining: Emphasis on Particle Swarm Optimization Algorithms

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.11
    • /
    • pp.41-50
    • /
    • 2020
  • Sentimental analysis begins with the search for words that determine the sentimentality inherent in data. Managers can understand market sentimentality by analyzing a number of relevant sentiment words which consumers usually tend to use. In this study, we propose exploring performance of feature selection methods embedded with Particle Swarm Optimization Multi Objectives Evolutionary Algorithms. The performance of the feature selection methods was benchmarked with machine learning classifiers such as Decision Tree, Naive Bayesian Network, Support Vector Machine, Random Forest, Bagging, Random Subspace, and Rotation Forest. Our empirical results of opinion mining revealed that the number of features was significantly reduced and the performance was not hurt. In specific, the Support Vector Machine showed the highest accuracy. Random subspace produced the best AUC results.

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

  • Dorjmaa, Tserendulam;Shin, Taeksoo
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.167-183
    • /
    • 2017
  • The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.

A Control Path Analysis Mechanism for Workflow Mining (워크플로우 마이닝을 위한 제어 경로 분석 메커니즘)

  • Min Jun-Ki;Kim Kwang-Hoon;Chung Jung-Su
    • Journal of Internet Computing and Services
    • /
    • v.7 no.1
    • /
    • pp.91-99
    • /
    • 2006
  • This paper proposes a control path analysis mechanism to be used in the workflow mining framework maximizing the workflow traceability and re discoverability by analyzing the total sequences of the control path perspective of a workflow model and by rediscovering their runtime enactment history from the workflow log information. The mechanism has two components One is to generate the total sequences of the control paths from a workflow mode by transforming it to a control path decision tree, and the other is to rediscover the runtime enactment history of each control path out of the total sequences from the corresponding workflow's execution logs. Eventually, these rediscovered knowledge and execution history of a workflow model make up a control path oriented intelligence of the workflow model. which ought to be an essential ingredient for maintaining and reengineering the qualify of the workflow model. Based upon the workflow intelligence, it is possible for the workflow model to be gradually refined and finally maximize its qualify by repeatedly redesigning and reengineering during its whole life long time period.

  • PDF

Global Unmanned Aerial Vehicle Utilization Research Trends

  • Moon, Ho-Gyeong;Kim, Han;Choi, Nak-Hyun;Kim, Dong-Pil
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.1 no.1
    • /
    • pp.31-40
    • /
    • 2020
  • The rapid development of technologies in unmanned aerial vehicles (UAVs) has led to their use in various areas. UAVs are mainly used for commercial purposes, but their utilization is increasingly important in other areas because their operation cost is less than satellites and aerial imaging. The utilization of UAVs in the environment/ecology area is relatively new. Therefore, identifying the trends of UAV-related spatial information is significant in basic research for UAV utilization. This study quantitatively identified domestic and international research trends related to UAV utilization and analyzed research areas. An attempt was also made to identify upcoming UAV-related topics in the environment/ecology research field using text mining to analyze the bibliographic information of global research literature. Domestic UAV-related studies were classified into seven clusters where basic research on "UAV technology/industry trends" was abundant, and studies on data collection and analysis through UAV remote sensing technology have increased since 2015. Eight clusters were identified for international studies where the most active research area international was "remote sensing technology/data analysis". In addition, Canopy, Classification, Forest, Leaf Area Index, Normalized Difference Vegetation Index, Temperature, Tree, and Atmosphere appeared as the main keywords related to environment and ecology. The appearance frequencies and association strengths were high because the advancement in UAV optical sensor technology and the rapid development of image processing technology enabled the acquisition of data that could not be obtained from existing spatial information. They are recognized as future research topics as related domestic studies have begun corresponding to international research.

Changes in Forest Disturbance Patterns from 1976 to 2005 in South Korea

  • Park, Pil Sun;Lee, Kyu Hwa;Jung, Mun Ho;Shin, Hanna;Jang, Woongsoon;Bae, Kikang;Lee, Jongkoo;Lee, Don Koo
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.5
    • /
    • pp.593-601
    • /
    • 2009
  • Forest disturbances including forest fire, insect pests and diseases, landslides, and forest conversion from 1976 to 2005 were investigated to trace the changes of major forest disturbance agents and their characteristics over time in accordance with changes in natural and social environment in South Korea. While the damaged area by insect pests and diseases continuously decreased for the past 30 years, damaged areas by forest fire and landslide were fluctuating through years. The interval of large forest fires has become shorter with increased tree volume. The precipitation between January and April were significantly correlated with large fire occurrences as Pearson's correlation coefficient -0.400 (P=0.029). The composition of major insect pests and diseases damaging Korean forests has been changed continuously, and become more diversified. While damages by pine caterpillar (Dendrolimus spectabilis) and pine needle gall midge (Thecodiplosis japonensis) decreased, damage by introduced pests has been more serious recently. The change of precipitation pattern that brought more localized heavy rain or powerful typhoon resulted in the recent increase in landslide areas. The major land uses to induce forest conversion have been changed, reflecting the changes in industrial structure in South Korea as agriculture and mining in 1970s, mining and golf ranges classified in pasture in 1980s, and road and housing construction in 1990s and 2000s. Changes in forest disturbance patterns in South Korea show that a country's industrial development is jointly working with global warming on forest stand dynamics. Altering energy structure and land use pattern induced by industrial development accumulates forest volume and reforms microenvironments on forest floor, interacting with climate change, inducing shorter interval of large forest fire and changes in major species composition of forest insect pests and diseases.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

A Study on the Application of Data-Mining Techniques into Effective CRM (Customer Relationship Management) for Internet Businesses (인터넷 비즈니스에서 효과적인 소비자 관계관리(Customer Relationship Management)를 위한 데이터 마이닝 기법의 응용에 대한 연구)

  • Kim, Choong-Young;Chang, Nam-Sik;Kim, Sang-Uk
    • Korean Business Review
    • /
    • v.15
    • /
    • pp.79-97
    • /
    • 2002
  • In this study, an analytical CRM for customer segmentation is exercised by integrating and analyzing the customer profile data and the access data to a particular web site. We believe that effective customer segmentation will be possible with a basis of the understanding of customer characteristics as well as behavior on the web. One of the critical tasks in the web data-mining is concerned with both 'how to collect the data from the web in an efficient manner?' and 'how to integrate the data(mostly in a variety of types) effectively for the analysis?' This study proposes a panel approach as an efficient data collection method in the web. For the customer data analysis, OLAF and a tree-structured algorithm are applied in this study. The results of the analysis with both techniques are compared, confirming the previous work which the two techniques are inter-complementary.

  • PDF

Utilizing Purely Symmetric J Measure for Association Rules (연관성 규칙의 탐색을 위한 순수 대칭적 J 측도의 활용)

  • Park, Hee-Chang
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2865-2872
    • /
    • 2018
  • In the field of data mining technique, there are various methods such as association rules, cluster analysis, decision tree, neural network. Among them, association rules are defined by using various association evaluation criteria such as support, confidence, and lift. Agrawal et al. (1993) first proposed this association rule, and since then research has been conducted by many scholars. Recently, studies related to crossover entropy have been published (Park, 2016b). In this paper, we proposed a purely symmetric J measure considering directionality and purity in the previously published J measure, and examined its usefulness by using examples. As a result, it is found that the pure symmetric J measure changes more clearly than the conventional J measure, the symmetric J measure, and the pure crossover entropy measure as the frequency of coincidence increases. The variation of the pure symmetric J measure was also larger depending on the magnitude of the inconsistency, and the presence or absence of the association was more clearly understood.

Feature Extraction and Evaluation for Classification Models of Injurious Falls Based on Surface Electromyography

  • Lim, Kitaek;Choi, Woochol Joseph
    • Physical Therapy Korea
    • /
    • v.28 no.2
    • /
    • pp.123-131
    • /
    • 2021
  • Background: Only 2% of falls in older adults result in serious injuries (i.e., hip fracture). Therefore, it is important to differentiate injurious versus non-injurious falls, which is critical to develop effective interventions for injury prevention. Objects: The purpose of this study was to a. extract the best features of surface electromyography (sEMG) for classification of injurious falls, and b. find a best model provided by data mining techniques using the extracted features. Methods: Twenty young adults self-initiated falls and landed sideways. Falling trials were consisted of three initial fall directions (forward, sideways, or backward) and three knee positions at the time of hip impact (the impacting-side knee contacted the other knee ("knee together") or the mat ("knee on mat"), or neither the other knee nor the mat was contacted by the impacting-side knee ("free knee"). Falls involved "backward initial fall direction" or "free knee" were defined as "injurious falls" as suggested from previous studies. Nine features were extracted from sEMG signals of four hip muscles during a fall, including integral of absolute value (IAV), Wilson amplitude (WAMP), zero crossing (ZC), number of turns (NT), mean of amplitude (MA), root mean square (RMS), average amplitude change (AAC), difference absolute standard deviation value (DASDV). The decision tree and support vector machine (SVM) were used to classify the injurious falls. Results: For the initial fall direction, accuracy of the best model (SVM with a DASDV) was 48%. For the knee position, accuracy of the best model (SVM with an AAC) was 49%. Furthermore, there was no model that has sensitivity and specificity of 80% or greater. Conclusion: Our results suggest that the classification model built upon the sEMG features of the four hip muscles are not effective to classify injurious falls. Future studies should consider other data mining techniques with different muscles.

Forecasting the Daily Container Volumes Using Data Mining with CART Approach (Datamining 기법을 활용한 단기 항만 물동량 예측)

  • Ha, Jun-Su;Lim, Chae Hwan;Cho, Kwang-Hee;Ha, Hun-Koo
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.3
    • /
    • pp.1-17
    • /
    • 2021
  • Forecasting the daily volume of container is important in many aspects of port operation. In this article, we utilized a machine-learning algorithm based on decision tree to predict future container throughput of Busan port. Accurate volume forecasting improves operational efficiency and service levels by reducing costs and shipowner latency. We showed that our method is capable of accurately and reliably predicting container throughput in short-term(days). Forecasting accuracy was improved by more than 22% over time series methods(ARIMA). We also demonstrated that the current method is assumption-free and not prone to human bias. We expect that such method could be useful in a broad range of fields.