• Title/Summary/Keyword: tree based learning

Search Result 435, Processing Time 0.027 seconds

Indexing Scheme for Case-Based Designs using Memory-Based Learning (기억기반학습을 이용한 사례기반설계시 참조사례의 인덱싱)

  • Gang, Jae-Ho;Ryu, Gwang-Ryeol;Lee, Dong-Gon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.1
    • /
    • pp.79-87
    • /
    • 1999
  • 사례기반추론(Case-Based Reasoning , CBR)은 새로운 문제가 주어질 때 과거의 유사한 문제 해결 사례를 기반으로 그 해법을 적절히 변용함으로써 새로운 문제에 적합한 해결책을 효율적으로 도출하고자 하는 문제 해결 접근 방법이다. 사례기반설계는 사례기반추론을 설계에 응용한 방법으로 유사한 요구 조건하에서 설계된 과거사례를 설계에 참고 및 활용하는 방법으로 선박개념설계 등 여러 분야에서 활용하고 있다. 이러한 사례기반설계기법을 이용하여 효율적으로 고품질의 설계를 도출하기 위해서는 설계하고자 하는 대상의 설계상의 요구조건과 부합되는 사례를 적절히 선정해야 하고, 선정된 사례와 현 설계조건과의 차이점을 명확하게 인지하여 현 상황에 맞게 변용할 수 있어야 한다. 본 논문에서는 과거 사례 선정 기록을 활용하여 그 선정 경향을 기억기반학습기법을 이용하여 학습함으로써 새로운 설계 시 적절한 사례를 선정하는 인덱싱 기법을 제시한다. 사례기반설계의 전형적인 예인 선박개념설계에서 설계 시 참조용도로 사용할 실적선을 선정하는 문제에 적용하여 실험에 본 결과 decision tree 나 간단한 휴리스틱을 적용하여 참조사례를 제시한 방법에 비해 본 논문에서 제시하는 기억기반학습을 적용한 방법이 우수함을 확인하였다.

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.

Comparative analysis of Machine-Learning Based Models for Metal Surface Defect Detection (머신러닝 기반 금속외관 결함 검출 비교 분석)

  • Lee, Se-Hun;Kang, Seong-Hwan;Shin, Yo-Seob;Choi, Oh-Kyu;Kim, Sijong;Kang, Jae-Mo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.834-841
    • /
    • 2022
  • Recently, applying artificial intelligence technologies in various fields of production has drawn an upsurge of research interest due to the increase for smart factory and artificial intelligence technologies. A great deal of effort is being made to introduce artificial intelligence algorithms into the defect detection task. Particularly, detection of defects on the surface of metal has a higher level of research interest compared to other materials (wood, plastics, fibers, etc.). In this paper, we compare and analyze the speed and performance of defect classification by combining machine learning techniques (Support Vector Machine, Softmax Regression, Decision Tree) with dimensionality reduction algorithms (Principal Component Analysis, AutoEncoders) and two convolutional neural networks (proposed method, ResNet). To validate and compare the performance and speed of the algorithms, we have adopted two datasets ((i) public dataset, (ii) actual dataset), and on the basis of the results, the most efficient algorithm is determined.

A Prediction of N-value Using Regression Analysis Based on Data Augmentation (데이터 증강 기반 회귀분석을 이용한 N치 예측)

  • Kim, Kwang Myung;Park, Hyoung June;Lee, Jae Beom;Park, Chan Jin
    • The Journal of Engineering Geology
    • /
    • v.32 no.2
    • /
    • pp.221-239
    • /
    • 2022
  • Unknown geotechnical characteristics are key challenges in the design of piles for the plant, civil and building works. Although the N-values which were read through the standard penetration test are important, those N-values of the whole area are not likely acquired in common practice. In this study, the N-value is predicted by means of regression analysis with artificial intelligence (AI). Big data is important to improve learning performance of AI, so circular augmentation method is applied to build up the big data at the current study. The optimal model was chosen among applied AI algorithms, such as artificial neural network, decision tree and auto machine learning. To select optimal model among the above three AI algorithms is to minimize the margin of error. To evaluate the method, actual data and predicted data of six performed projects in Poland, Indonesia and Malaysia were compared. As a result of this study, the AI prediction of this method is proven to be reliable. Therefore, it is realized that the geotechnical characteristics of non-boring points were predictable and the optimal arrangement of structure could be achieved utilizing three dimensional N-value distribution map.

Design of Machine Learning based Smart Service Abstraction Layer for Future Network Provisioning (미래 네트워크 제공을 위한 기계 학습 기반 스마트 서비스 추상화 계층 설계)

  • Vu, Duc Tiep;N., Gde Dharma;Kim, Kyungbaek;Choi, Deokjai
    • Annual Conference of KIPS
    • /
    • 2016.10a
    • /
    • pp.114-116
    • /
    • 2016
  • Recently, SDN and NFV technology have been developed actively and provide enormous flexibility of network provisioning. The future network services would generally involve many different types of services such as hologram games, social network live streaming videos and cloud-computing services, which have dynamic service requirements. To provision networks for future services dynamically and efficiently, SDN/NFV orchestrators must clearly understand the service requirements. Currently, network provisioning relies heavily on QoS parameters such as bandwidth, delay, jitter and throughput, and those parameters are necessary to describe the network requirements of a service. However it is often difficult for users to understand and use them proficiently. Therefore, in order to maintain interoperability and homogeneity, it is required to have a service abstraction layer between users and orchestrators. The service abstraction layer analyzes ambiguous user's requirements for the desired services, and this layer generates corresponding refined services requirements. In this paper, we present our initial effort to design a Smart Service Abstraction Layer (SmSAL) for future network architecture, which takes advantage of machine learning method to analyze ambiguous and abstracted user-friendly input parameters and generate corresponding network parameters of the desired service for better network provisioning. As an initial proof-of-concept implementation for providing viability of the proposed idea, we implemented SmSAL with a decision tree model created by learning process with previous service requests in order to generate network parameters related to various audio and video services, and showed that the parameters are generated successfully.

A study on Data Preprocessing for Developing Remaining Useful Life Predictions based on Stochastic Degradation Models Using Air Craft Engine Data (항공엔진 열화데이터 기반 잔여수명 예측력 향상을 위한 데이터 전처리 방법 연구)

  • Yoon, Yeon Ah;Jung, Jin Hyeong;Lim, Jun Hyoung;Chang, Tai-Woo;Kim, Yong Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.48-55
    • /
    • 2020
  • Recently, a study of prognosis and health management (PHM) was conducted to diagnose failure and predict the life of air craft engine parts using sensor data. PHM is a framework that provides individualized solutions for managing system health. This study predicted the remaining useful life (RUL) of aeroengine using degradation data collected by sensors provided by the IEEE 2008 PHM Conference Challenge. There are 218 engine sensor data that has initial wear and production deviations. It was difficult to determine the characteristics of the engine parts since the system and domain-specific information was not provided. Each engine has a different cycle, making it difficult to use time series models. Therefore, this analysis was performed using machine learning algorithms rather than statistical time series models. The machine learning algorithms used were a random forest, gradient boost tree analysis and XG boost. A sliding window was applied to develop RUL predictions. We compared model performance before and after applying the sliding window, and proposed a data preprocessing method to develop RUL predictions. The model was evaluated by R-square scores and root mean squares error (RMSE). It was shown that the XG boost model of the random split method using the sliding window preprocessing approach has the best predictive performance.

Genetic Algorithm Based Attribute Value Taxonomy Generation for Learning Classifiers with Missing Data (유전자 알고리즘 기반의 불완전 데이터 학습을 위한 속성값계층구조의 생성)

  • Joo Jin-U;Yang Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.133-138
    • /
    • 2006
  • Learning with Attribute Value Taxonomies (AVT) has shown that it is possible to construct accurate, compact and robust classifiers from a partially missing dataset (dataset that contains attribute values specified with different level of precision). Yet, in many cases AVTs are generated from experts or people with specialized knowledge in their domain. Unfortunately these user-provided AVTs can be time-consuming to construct and misguided during the AVT building process. Moreover experts are occasionally unavailable to provide an AVT for a particular domain. Against these backgrounds, this paper introduces an AVT generating method called GA-AVT-Learner, which finds a near optimal AVT with a given training dataset using a genetic algorithm. This paper conducted experiments generating AVTs through GA-AVT-Learner with a variety of real world datasets. We compared these AVTs with other types of AVTs such as HAC-AVTs and user-provided AVTs. Through the experiments we have proved that GA-AVT-Learner provides AVTs that yield more accurate and compact classifiers and improve performance in learning missing data.

Ensemble Machine Learning Model Based YouTube Spam Comment Detection (앙상블 머신러닝 모델 기반 유튜브 스팸 댓글 탐지)

  • Jeong, Min Chul;Lee, Jihyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.576-583
    • /
    • 2020
  • This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.

Multi-dimensional Analysis and Prediction Model for Tourist Satisfaction

  • Shrestha, Deepanjal;Wenan, Tan;Gaudel, Bijay;Rajkarnikar, Neesha;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.480-502
    • /
    • 2022
  • This work assesses the degree of satisfaction tourists receive as final recipients in a tourism destination based on the fact that satisfied tourists can make a significant contribution to the growth and continuous improvement of a tourism business. The work considers Pokhara, the tourism capital of Nepal as a prefecture of study. A stratified sampling methodology with open-ended survey questions is used as a primary source of data for a sample size of 1019 for both international and domestic tourists. The data collected through a survey is processed using a data mining tool to perform multi-dimensional analysis to discover information patterns and visualize clusters. Further, supervised machine learning algorithms, kNN, Decision tree, Support vector machine, Random forest, Neural network, Naive Bayes, and Gradient boost are used to develop models for training and prediction purposes for the survey data. To find the best model for prediction purposes, different performance matrices are used to evaluate a model for performance, accuracy, and robustness. The best model is used in constructing a learning-enabled model for predicting tourists as satisfied, neutral, and unsatisfied visitors. This work is very important for tourism business personnel, government agencies, and tourism stakeholders to find information on tourist satisfaction and factors that influence it. Though this work was carried out for Pokhara city of Nepal, the study is equally relevant to any other tourism destination of similar nature.

Hybrid machine learning with HHO method for estimating ultimate shear strength of both rectangular and circular RC columns

  • Quang-Viet Vu;Van-Thanh Pham;Dai-Nhan Le;Zhengyi Kong;George Papazafeiropoulos;Viet-Ngoc Pham
    • Steel and Composite Structures
    • /
    • v.52 no.2
    • /
    • pp.145-163
    • /
    • 2024
  • This paper presents six novel hybrid machine learning (ML) models that combine support vector machines (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), extreme gradient boosting (XGB), and categorical gradient boosting (CGB) with the Harris Hawks Optimization (HHO) algorithm. These models, namely HHO-SVM, HHO-DT, HHO-RF, HHO-GB, HHO-XGB, and HHO-CGB, are designed to predict the ultimate strength of both rectangular and circular reinforced concrete (RC) columns. The prediction models are established using a comprehensive database consisting of 325 experimental data for rectangular columns and 172 experimental data for circular columns. The ML model hyperparameters are optimized through a combination of cross-validation technique and the HHO. The performance of the hybrid ML models is evaluated and compared using various metrics, ultimately identifying the HHO-CGB model as the top-performing model for predicting the ultimate shear strength of both rectangular and circular RC columns. The mean R-value and mean a20-index are relatively high, reaching 0.991 and 0.959, respectively, while the mean absolute error and root mean square error are low (10.302 kN and 27.954 kN, respectively). Another comparison is conducted with four existing formulas to further validate the efficiency of the proposed HHO-CGB model. The Shapely Additive Explanations method is applied to analyze the contribution of each variable to the output within the HHO-CGB model, providing insights into the local and global influence of variables. The analysis reveals that the depth of the column, length of the column, and axial loading exert the most significant influence on the ultimate shear strength of RC columns. A user-friendly graphical interface tool is then developed based on the HHO-CGB to facilitate practical and cost-effective usage.