• Title/Summary/Keyword: Tree-based algorithms

Search Result 385, Processing Time 0.022 seconds

URL Filtering by Using Machine Learning

  • Saqib, Malik Najmus
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.275-279
    • /
    • 2022
  • The growth of technology nowadays has made many things easy for humans. These things are from everyday small task to more complex tasks. Such growth also comes with the illegal activities that are perform by using technology. These illegal activities can simple as displaying annoying message to big frauds. The easiest way for the attacker to perform such activities is to convenience user to click on the malicious link. It has been a great concern since a decay to classify URLs as malicious or benign. The blacklist has been used initially for that purpose and is it being used nowadays. It is efficient but has a drawback to update blacklist automatically. So, this method is replace by classification of URLs based on machine learning algorithms. In this paper we have use four machine learning classification algorithms to classify URLs as malicious or benign. These algorithms are support vector machine, random forest, n-nearest neighbor, and decision tree. The dataset that is used in this research has 36694 instances. A comparison of precision accuracy and recall values are shown for dataset with and without preprocessing.

A Study on the Partition Operating Circuit Design based on Directed Graph (방향성 그래프에 기초한 분할연산 회로설계에 관한 연구)

  • Park, Chun-Myoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.9
    • /
    • pp.2091-2096
    • /
    • 2013
  • This paper present a method of efficiency circuit design based on directed graph which was represented by tree structure relationship between input and output of nodes. In this paper, we introduce the concept of mathematical analysis based on tree structure which was designed by optimal localized computable circuit. Using the proposed circuit design algorithms in this paper, it is possible to design circuit which directed tree graph have any node number. The proposed method is more effective, regularity and extensibility than former method.

Channel Allocation Strategies for Interference-Free Multicast in Multi-Channel Multi-Radio Wireless Mesh Networks

  • Yang, Wen-Lin;Hong, Wan-Ting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.2
    • /
    • pp.629-648
    • /
    • 2012
  • Given a video stream delivering system deployed on a multicast tree, which is embedded in a multi-channel multi-radio wireless mesh network, our problem is concerned about how to allocate interference-free channels to tree links and maximize the number of serviced mesh clients at the same time. In this paper, we propose a channel allocation heuristic algorithm based on best-first search and backtracking techniques. The experimental results show that our BFB based CA algorithm outperforms previous methods such as DFS and BFS based CA methods. This superiority is due to the backtracking technique used in BFB approach. It allows previous channel-allocated links to have feasibility to select the other eligible channels when no conflict-free channel can be found for the current link during the CA process. In addition to that, we also propose a tree refinement method to enhance the quality of channel-allocated trees by adding uncovered destinations at the cost of deletion of some covered destinations. Our aim of this refinement is to increase the number of serviced mesh clients. According to our simulation results, it is proved to be an effective method for improving multicast trees produced by BFB, BFS and DFS CA algorithms.

Machine Learning-Based Prediction Technology for Medical Treatment Period of Automobile Insurance Accident Patients (머신러닝 기반의 자동차보험 사고 환자의 진료 기간 예측 기술)

  • Kyung-Keun Byun;Doeg-Gyu Lee;Hyung-Dong Lee
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • In order to help reduce the medical expenses of patients with auto insurance accidents, this study predicted the treatment period, which is the most important factor in the medical expenses of patients in their 40s and 50s, and analyzed the factors affecting the treatment period. To this end, a mechine learning model using five algorithms such as Decision Tree was created, and its performance was compared and analyzed between models. There were three algorithms that showed good performance including Decison Tree, Gradient Boost, and XGBoost. In addition, as a result of analyzing the factors affecting the prediction of the treatment period, the type of hospital, the treatment area, age, and gender were found. Through these studies, easy research methods such as the use of AutoML were presented, and we hope that the results of this study will help policies to reduce medical expenses for automobile insurance accidents.

Balanced Binary Search Using Prefix Vector for IP Address Lookup (프리픽스 벡터를 사용한 균형 이진 IP 주소 검색 구조)

  • Kim, Hyeong-Gee;Lim, Hye-Sook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.5B
    • /
    • pp.285-295
    • /
    • 2008
  • Internet routers perform packet forwarding which determines a next hop for each incoming packet using the packet's destination IP address. IP address lookup becomes one of the major challenges because it should be performed in wire-speed for every incoming packet under the circumstance of the advancement in link technologies and the growth of the number of the Internet users. Many binary search algorithms have been proposed for fast IP address lookup. However, tree-based binary search algorithms are usually unbalanced, and they do not provide very good search performance. Even for binary search algorithms providing balanced search, they have drawbacks requiring prefix duplication. In this paper, a new binary search algorithm which provides the balanced binary search and the number of its entries is much less than the number of original prefixes. This is possible because of composing the binary search tree only with disjoint prefixes of the prefix set. Each node has a prefix vector that has the prefix nesting information. The number of memory accesses of the proposed algorithm becomes much less than that of prior binary search algorithms, and hence its performance for IP address lookup is considerably improved.

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

  • Young-Jin, Han;In-Whee, Joe
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.12
    • /
    • pp.445-452
    • /
    • 2022
  • Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.

CHAID Algorithm by Cube-based Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.239-247
    • /
    • 2003
  • Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, etc. CHAID(Chi-square Automatic Interaction Detector), is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose and CHAID algorithm by cube-based sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

A Study on Development of A Web-Based Forecasting System of Industrial Accidents (웹 기반의 산업재해 예측시스템 개발에 관한 연구)

  • Leem, Young-Moon;Hwang, Young-Seob;Choi, Yo-Han
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2007.11a
    • /
    • pp.269-274
    • /
    • 2007
  • Ultimate goal of this research is to develop a web-based forecasting system of industrial accidents. As an initial step for the purpose of this study, this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. In addition, this paper presents the logical process for development of a forecasting system. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. The sample for this work was chosen from 10,536 data related to manufacturing industries during three years(2002$^{\sim}$2004) in korea.

  • PDF

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.39-50
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID(Chi-square Automatic Interaction Detector) uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.803-816
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF