• Title/Summary/Keyword: Tree Algorithm

Search Result 1,736, Processing Time 0.026 seconds

A prediction model for adolescents' skipping breakfast using the CART algorithm for decision trees: 7th (2016-2018) Korea National Health and Nutrition Examination Survey (의사결정나무 CART 알고리즘을 이용한 청소년 아침결식 예측 모형: 제7기 (2016-2018년) 국민건강영양조사 자료분석)

  • Sun A Choi;Sung Suk Chung;Jeong Ok Rho
    • Journal of Nutrition and Health
    • /
    • v.56 no.3
    • /
    • pp.300-314
    • /
    • 2023
  • Purpose: This study sought to predict the reasons for skipping breakfast by adolescents aged 13-18 years using the 7th Korea National Health and Nutrition Examination Survey (KNHANES). Methods: The participants included 1,024 adolescents. The data were analyzed using a complex-sample t-test, the Rao Scott χ2-test, and the classification and regression tree (CART) algorithm for decision tree analysis with SPSS v. 27.0. The participants were divided into two groups, one regularly eating breakfast and the other skipping it. Results: A total of 579 and 445 study participants were found to be breakfast consumers and breakfast skippers respectively. Breakfast consumers were significantly younger than those who skipped breakfast. In addition, breakfast consumers had a significantly higher frequency of eating dinner, had been taught about nutrition, and had a lower frequency of eating out. The breakfast skippers did so to lose weight. Children who skipped breakfast consumed less energy, carbohydrates, proteins, fats, fiber, cholesterol, vitamin C, vitamin A, calcium, vitamin B1, vitamin B2, phosphorus, sodium, iron, potassium, and niacin than those who consumed breakfast. The best predictor of skipping breakfast was identifying adolescents who sought to control their weight by not eating meals. Other participants who had low and middle-low household incomes, ate dinner 3-4 times a week, were more than 14.5 years old, and ate out once a day showed a higher frequency of skipping breakfast. Conclusion: Based on these results, nutrition education targeted at losing weight correctly and emphasizing the importance of breakfast, especially for adolescents, is required. Moreover, nutrition educators should consider designing and implementing specific action plans to encourage adolescents to improve their breakfast-eating practices by also eating dinner regularly and reducing eating out.

Development of High-Resolution Fog Detection Algorithm for Daytime by Fusing GK2A/AMI and GK2B/GOCI-II Data (GK2A/AMI와 GK2B/GOCI-II 자료를 융합 활용한 주간 고해상도 안개 탐지 알고리즘 개발)

  • Ha-Yeong Yu;Myoung-Seok Suh
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_3
    • /
    • pp.1779-1790
    • /
    • 2023
  • Satellite-based fog detection algorithms are being developed to detect fog in real-time over a wide area, with a focus on the Korean Peninsula (KorPen). The GEO-KOMPSAT-2A/Advanced Meteorological Imager (GK2A/AMI, GK2A) satellite offers an excellent temporal resolution (10 min) and a spatial resolution (500 m), while GEO-KOMPSAT-2B/Geostationary Ocean Color Imager-II (GK2B/GOCI-II, GK2B) provides an excellent spatial resolution (250 m) but poor temporal resolution (1 h) with only visible channels. To enhance the fog detection level (10 min, 250 m), we developed a fused GK2AB fog detection algorithm (FDA) of GK2A and GK2B. The GK2AB FDA comprises three main steps. First, the Korea Meteorological Satellite Center's GK2A daytime fog detection algorithm is utilized to detect fog, considering various optical and physical characteristics. In the second step, GK2B data is extrapolated to 10-min intervals by matching GK2A pixels based on the closest time and location when GK2B observes the KorPen. For reflectance, GK2B normalized visible (NVIS) is corrected using GK2A NVIS of the same time, considering the difference in wavelength range and observation geometry. GK2B NVIS is extrapolated at 10-min intervals using the 10-min changes in GK2A NVIS. In the final step, the extrapolated GK2B NVIS, solar zenith angle, and outputs of GK2A FDA are utilized as input data for machine learning (decision tree) to develop the GK2AB FDA, which detects fog at a resolution of 250 m and a 10-min interval based on geographical locations. Six and four cases were used for the training and validation of GK2AB FDA, respectively. Quantitative verification of GK2AB FDA utilized ground observation data on visibility, wind speed, and relative humidity. Compared to GK2A FDA, GK2AB FDA exhibited a fourfold increase in spatial resolution, resulting in more detailed discrimination between fog and non-fog pixels. In general, irrespective of the validation method, the probability of detection (POD) and the Hanssen-Kuiper Skill score (KSS) are high or similar, indicating that it better detects previously undetected fog pixels. However, GK2AB FDA, compared to GK2A FDA, tends to over-detect fog with a higher false alarm ratio and bias.

Adaptive Row Major Order: a Performance Optimization Method of the Transform-space View Join (적응형 행 기준 순서: 변환공간 뷰 조인의 성능 최적화 방법)

  • Lee Min-Jae;Han Wook-Shin;Whang Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.345-361
    • /
    • 2005
  • A transform-space index indexes objects represented as points in the transform space An advantage of a transform-space index is that optimization of join algorithms using these indexes becomes relatively simple. However, the disadvantage is that these algorithms cannot be applied to original-space indexes such as the R-tree. As a way of overcoming this disadvantages, the authors earlier proposed the transform-space view join algorithm that joins two original- space indexes in the transform space through the notion of the transform-space view. A transform-space view is a virtual transform-space index that allows us to perform join in the transform space using original-space indexes. In a transform-space view join algorithm, the order of accessing disk pages -for which various space filling curves could be used -makes a significant impact on the performance of joins. In this paper, we Propose a new space filling curve called the adaptive row major order (ARM order). The ARM order adaptively controls the order of accessing pages and significantly reduces the one-pass buffer size (the minimum buffer size required for guaranteeing one disk access per page) and the number of disk accesses for a given buffer size. Through analysis and experiments, we verify the excellence of the ARM order when used with the transform-space view join. The transform-space view join with the ARM order always outperforms existing ones in terms of both measures used: the one-pass buffer size and the number of disk accesses for a given buffer size. Compared to other conventional space filling curves used with the transform-space view join, it reduces the one-pass buffer size by up to 21.3 times and the number of disk accesses by up to $74.6\%$. In addition, compared to existing spatial join algorithms that use R-trees in the original space, it reduces the one-pass buffer size by up to 15.7 times and the number of disk accesses by up to $65.3\%$.

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Cost-Effectiveness Analysis of Different Management Strategies for Detection CIN2+ of Women with Atypical Squamous Cells of Undetermined Significance (ASC-US) Pap Smear in Thailand

  • Tantitamit, Tanitra;Termrungruanglert, Wichai;Oranratanaphan, Shina;Niruthisard, Somchai;Tanbirojn, Patuou;Havanond, Piyalamporn
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.16
    • /
    • pp.6857-6862
    • /
    • 2015
  • Background: To identify the optimal cost effective strategy for the management of women having ASC-US who attended at King Chulalongkorn Memorial Hospital (KMCH). Design: An Economical Analysis based on a retrospective study. Subject: The women who were referred to the gynecological department due to screening result of ASC-US at King Chulalongkorn Memorial Hospital, a general and tertiary referral center in Bangkok Thailand, from Jan 2008 - Dec 2012. Materials and Methods: A decision tree-based was constructed to evaluate the cost effectiveness of three follow up strategies in the management of ASC-US results: repeat cytology, triage with HPV testing and immediate colposcopy. Each ASC-US woman made the decision of each strategy after receiving all details about this algorithm, advantages and disadvantages of each strategy from a doctor. The model compared the incremental costs per case of high-grade cervical intraepithelial neoplasia (CIN2+) detected as measured by incremental cost-effectiveness ratio (ICER). Results: From the provider's perspective, immediate colposcopy is the least costly strategy and also the most effective option among the three follow up strategies. Compared with HPV triage, repeat cytology triage is less costly than HPV triage, whereas the latter provides a more effective option at an incremental cost-effectiveness ratio (ICER) of 56,048 Baht per additional case of CIN 2+ detected. From the patient's perspective, the least costly and least effective is repeat cytology triage. Repeat colposcopy has an incremental cost-effectiveness (ICER) of 2,500 Baht per additional case of CIN2+ detected when compared to colposcopy. From the sensitivity analysis, immediate colposcopy triage is no longer cost effective when the cost exceeds 2,250 Baht or the cost of cytology is less than 50 Baht (1USD = 31.58 THB). Conclusions: In women with ASC-US cytology, colposcopy is more cost-effective than repeat cytology or triage with HPV testing for both provider and patient perspectives.

Phylogenetic Relationship of Ligularia Species Based on RAPD and ITS Sequences Analyses (RAPD 및 ITS 염기서열 분석을 이용한 곰취 속(Ligularia) 식물의 유연관계 분석)

  • Ahn, Soon-Young;Cho, Kwang-Soo;Yoo, Ki-Oug;Suh, Jong-Taek
    • Horticultural Science & Technology
    • /
    • v.28 no.4
    • /
    • pp.638-647
    • /
    • 2010
  • The genetic relationships in 5 species of $Ligularia$ were investigated using RAPD (Randomly Amplified Polymorphic DNA) and ITS (Internal Transcribed Spacer) sequences analyses. In RAPD analysis, sixty three of 196 arbitrary primers showed polymorphism. The amplified fragments ranged from 0.2 to 1.6 kb in size. The dendrogram was constructed by the UPGMA clustering algorithm based on genetic similarity of RAPD markers. A total of 16 accessions were classified into 5 major groups corresponding each species at the similarity coefficient value of 0.77. In the ITS sequence analysis, the size of ITS 1 was varied from 248 to 256 bp, while ITS 2 was varied from 220 to 222 bp. The 5.8S coding region was 164 bp in lengths. Forty nine sites (10.2%) of the 478 nucleotides were variable, and the G+C content of ITS region ranged from 49.4 to 53.5%. In the ITS tree, five species of $Ligularia$ were monophyletic, and $L.$ $taquetii$ was the first branching within the clade. $Ligularia$ $intermedia$ formed a clade with $L.$ $fischeri$ var. $spiciformis$ (BS=79), and $L.$ $stenocephala$ and $L.$ $fischeri$ were also claded. Two data sets were congruent, except of the position of $L.$ $fischeri$ var. $spiciformis$.