• Title/Summary/Keyword: tree based learning

Search Result 435, Processing Time 0.027 seconds

Exploring Machine Learning Classifiers for Breast Cancer Classification

  • Inayatul Haq;Tehseen Mazhar;Hinna Hafeez;Najib Ullah;Fatma Mallek;Habib Hamam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.860-880
    • /
    • 2024
  • Breast cancer is a major health concern affecting women and men globally. Early detection and accurate classification of breast cancer are vital for effective treatment and survival of patients. This study addresses the challenge of accurately classifying breast tumors using machine learning classifiers such as MLP, AdaBoostM1, logit Boost, Bayes Net, and the J48 decision tree. The research uses a dataset available publicly on GitHub to assess the classifiers' performance and differentiate between the occurrence and non-occurrence of breast cancer. The study compares the 10-fold and 5-fold cross-validation effectiveness, showing that 10-fold cross-validation provides superior results. Also, it examines the impact of varying split percentages, with a 66% split yielding the best performance. This shows the importance of selecting appropriate validation techniques for machine learning-based breast tumor classification. The results also indicate that the J48 decision tree method is the most accurate classifier, providing valuable insights for developing predictive models for cancer diagnosis and advancing computational medical research.

Utilizing Data Mining Techniques to Predict Students Performance using Data Log from MOODLE

  • Noora Shawareb;Ahmed Ewais;Fisnik Dalipi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.9
    • /
    • pp.2564-2588
    • /
    • 2024
  • Due to COVID19 pandemic, most of educational institutions and schools changed the traditional way of teaching to online teaching and learning using well-known Learning Management Systems (LMS) such as Moodle, Canvas, Blackboard, etc. Accordingly, LMS started to generate a large data related to students' characteristics and achievements and other course-related information. This makes it difficult to teachers to monitor students' behaviour and performance. Therefore, a need to support teachers with a tool alerting student who might be in risk based on their recorded activities and achievements in adopted LMS in the school. This paper focuses on the benefits of using recorded data in LMS platforms, specifically Moodle, to predict students' performance by analysing their behavioural data and engagement activities using data mining techniques. As part of the overall process, this study encountered the task of extracting and selecting relevant data features for predicting performance, along with designing the framework and choosing appropriate machine learning techniques. The collected data underwent pre-processing operations to remove random partitions, empty values, duplicates, and code the data. Different machine learning techniques, including k-NN, TREE, Ensembled Tree, SVM, and MLPNNs were applied to the processed data. The results showed that the MLPNNs technique outperformed other classification techniques, achieving a classification accuracy of 93%, while SVM and k-NN achieved 90% and 87% respectively. This indicates the possibility for future research to investigate incorporating other neural network methods for categorizing students using data from LMS.

Machine Learning-Based Rapid Prediction Method of Failure Mode for Reinforced Concrete Column (기계학습 기반 철근콘크리트 기둥에 대한 신속 파괴유형 예측 모델 개발 연구)

  • Kim, Subin;Oh, Keunyeong;Shin, Jiuk
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.113-119
    • /
    • 2024
  • Existing reinforced concrete buildings with seismically deficient column details affect the overall behavior depending on the failure type of column. This study aims to develop and validate a machine learning-based prediction model for the column failure modes (shear, flexure-shear, and flexure failure modes). For this purpose, artificial neural network (ANN), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) models were used, considering previously collected experimental data. Using four machine learning methodologies, we developed a classification learning model that can predict the column failure modes in terms of the input variables using concrete compressive strength, steel yield strength, axial load ratio, height-to-dept aspect ratio, longitudinal reinforcement ratio, and transverse reinforcement ratio. The performance of each machine learning model was compared and verified by calculating accuracy, precision, recall, F1-Score, and ROC. Based on the performance measurements of the classification model, the RF model represents the highest average value of the classification model performance measurements among the considered learning methods, and it can conservatively predict the shear failure mode. Thus, the RF model can rapidly predict the column failure modes with simple column details.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.5
    • /
    • pp.617-625
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

A New Decision Tree Algorithm Based on Rough Set and Entity Relationship (러프셋 이론과 개체 관계 비교를 통한 의사결정나무 구성)

  • Han, Sang-Wook;Kim, Jae-Yearn
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.33 no.2
    • /
    • pp.183-190
    • /
    • 2007
  • We present a new decision tree classification algorithm using rough set theory that can induce classification rules, the construction of which is based on core attributes and relationship between objects. Although decision trees have been widely used in machine learning and artificial intelligence, little research has focused on improving classification quality. We propose a new decision tree construction algorithm that can be simplified and provides an improved classification quality. We also compare the new algorithm with the ID3 algorithm in terms of the number of rules.

Machine Diagnosis and Maintenance Policy Generation Using Adaptive Decision Tree and Shortest Path Problem (적응형 의사결정 트리와 최단 경로법을 이용한 기계 진단 및 보전 정책 수립)

  • 백준걸
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.27 no.2
    • /
    • pp.33-49
    • /
    • 2002
  • CBM (Condition-Based Maintenance) has increasingly drawn attention in industry because of its many benefits. CBM Problem Is characterized as a state-dependent scheduling model that demands simultaneous maintenance actions, each for an attribute that influences on machine condition. This problem is very hard to solve within conventional Markov decision process framework. In this paper, we present an intelligent machine maintenance scheduler, for which a new incremental decision tree learning method as evolutionary system identification model and shortest path problem as schedule generation model are developed. Although our approach does not guarantee an optimal scheduling policy in mathematical viewpoint, we verified through simulation based experiment that the intelligent scheduler is capable of providing good scheduling policy that can be used in practice.

Design of Tree Architecture of Fuzzy Controller based on Genetic Optimization

  • Han, Chang-Wook;Oh, Se-Jin
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.250-254
    • /
    • 2010
  • As the number of input and fuzzy set of a fuzzy system increase, the size of the rule base increases exponentially and becomes unmanageable (curse of dimensionality). In this paper, tree architectures of fuzzy controller (TAFC) is proposed to overcome the curse of dimensionality problem occurring in the design of fuzzy controller. TAFC is constructed with the aid of AND and OR fuzzy neurons. TAFC can guarantee reduced size of rule base with reasonable performance. For the development of TAFC, genetic algorithm constructs the binary tree structure by optimally selecting the nodes and leaves, and then random signal-based learning further refines the binary connections (two-step optimization). An inverted pendulum system is considered to verify the effectiveness of the proposed method by simulation.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.99-104
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

A Comparative Study on Collision Detection Algorithms based on Joint Torque Sensor using Machine Learning (기계학습을 이용한 Joint Torque Sensor 기반의 충돌 감지 알고리즘 비교 연구)

  • Jo, Seonghyeon;Kwon, Wookyong
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.2
    • /
    • pp.169-176
    • /
    • 2020
  • This paper studied the collision detection of robot manipulators for safe collaboration in human-robot interaction. Based on sensor-based collision detection, external torque is detached from subtracting robot dynamics. To detect collision using joint torque sensor data, a comparative study was conducted using data-based machine learning algorithm. Data was collected from the actual 3 degree-of-freedom (DOF) robot manipulator, and the data was labeled by threshold and handwork. Using support vector machine (SVM), decision tree and k-nearest neighbors KNN method, we derive the optimal parameters of each algorithm and compare the collision classification performance. The simulation results are analyzed for each method, and we confirmed that by an optimal collision status detection model with high prediction accuracy.

A customer credit Prediction Researched to Improve Credit Stability based on Artificial Intelligence

  • MUN, Ji-Hui;JUNG, Sang Woo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.21-27
    • /
    • 2021
  • In this Paper, Since the 1990s, Korea's credit card industry has steadily developed. As a result, various problems have arisen, such as careless customer information management and loans to low-credit customers. This, in turn, had a high delinquency rate across the card industry and a negative impact on the economy. Therefore, in this paper, based on Azure, we analyze and predict the delinquency and delinquency periods of credit loans according to gender, own car, property, number of children, education level, marital status, and employment status through linear regression analysis and enhanced decision tree algorithm. These predictions can consequently reduce the likelihood of reckless credit lending and issuance of credit cards, reducing the number of bad creditors and reducing the risk of banks. In addition, after classifying and dividing the customer base based on the predicted result, it can be used as a basis for reducing the risk of credit loans by developing a credit product suitable for each customer. The predicted result through Azure showed that when predicting with Linear Regression and Boosted Decision Tree algorithm, the Boosted Decision Tree algorithm made more accurate prediction. In addition, we intend to increase the accuracy of the analysis by assigning a number to each data in the future and predicting again.