• Title/Summary/Keyword: tree based learning

Search Result 435, Processing Time 0.028 seconds

Financial Instruments Recommendation based on Classification Financial Consumer by Text Mining Techniques (비정형 데이터 분석을 통한 금융소비자 유형화 및 그에 따른 금융상품 추천 방법)

  • Lee, Jaewoong;Kim, Young-Sik;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.15 no.4
    • /
    • pp.1-24
    • /
    • 2016
  • With the innovation of information technology, non-face-to-face robo advisor with high accessibility and convenience is spreading. The current robot advisor recommends appropriate investment products after understanding the investment propensity based on the structured data entered directly or indirectly by individuals. However, it is an inconvenient and obtrusive way for financial consumers to inquire or input their own subjective propensity to invest. Hence, this study proposes a way to deduce the propensity to invest in unstructured data that customers voluntarily exposed during consultation or online. Since prediction performance based on unstructured document differs according to the characteristics of text, in this study, classification algorithm optimized for the characteristic of text left by financial consumers is selected by performing prediction performance evaluation of various learning discrimination algorithms and proposed an intelligent method that automatically recommends investment products. User tests were given to MBA students. After showing the recommended investment and list of investment products, satisfaction was asked. Financial consumers' satisfaction was measured by dividing them into investment propensity and recommendation goods. The results suggest that the users high satisfaction with investment products recommended by the method proposed in this paper. The results showed that it can be applies to non-face-to-face robo advisor.

The Analysis of the Activity Patterns of Dog with Wearable Sensors Using Machine Learning

  • Hussain, Ali;Ali, Sikandar;Kim, Hee-Cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.141-143
    • /
    • 2021
  • The Activity patterns of animal species are difficult to access and the behavior of freely moving individuals can not be assessed by direct observation. As it has become large challenge to understand the activity pattern of animals such as dogs, and cats etc. One approach for monitoring these behaviors is the continuous collection of data by human observers. Therefore, in this study we assess the activity patterns of dog using the wearable sensors data such as accelerometer and gyroscope. A wearable, sensor -based system is suitable for such ends, and it will be able to monitor the dogs in real-time. The basic purpose of this study was to develop a system that can detect the activities based on the accelerometer and gyroscope signals. Therefore, we purpose a method which is based on the data collected from 10 dogs, including different nine breeds of different sizes and ages, and both genders. We applied six different state-of-the-art classifiers such as Random forests (RF), Support vector machine (SVM), Gradient boosting machine (GBM), XGBoost, k-nearest neighbors (KNN), and Decision tree classifier, respectively. The Random Forest showed a good classification result. We achieved an accuracy 86.73% while the detecting the activity.

  • PDF

Study on Fault Diagnosis and Data Processing Techniques for Substrate Transfer Robots Using Vibration Sensor Data

  • MD Saiful Islam;Mi-Jin Kim;Kyo-Mun Ku;Hyo-Young Kim;Kihyun Kim
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.31 no.2
    • /
    • pp.45-53
    • /
    • 2024
  • The maintenance of semiconductor equipment is crucial for the continuous growth of the semiconductor market. System management is imperative given the anticipated increase in the capacity and complexity of industrial equipment. Ensuring optimal operation of manufacturing processes is essential to maintaining a steady supply of numerous parts. Particularly, monitoring the status of substrate transfer robots, which play a central role in these processes, is crucial. Diagnosing failures of their major components is vital for preventive maintenance. Fault diagnosis methods can be broadly categorized into physics-based and data-driven approaches. This study focuses on data-driven fault diagnosis methods due to the limitations of physics-based approaches. We propose a methodology for data acquisition and preprocessing for robot fault diagnosis. Data is gathered from vibration sensors, and the data preprocessing method is applied to the vibration signals. Subsequently, the dataset is trained using Gradient Tree-based XGBoost machine learning classification algorithms. The effectiveness of the proposed model is validated through performance evaluation metrics, including accuracy, F1 score, and confusion matrix. The XGBoost classifiers achieve an accuracy of approximately 92.76% and an equivalent F1 score. ROC curves indicate exceptional performance in class discrimination, with 100% discrimination for the normal class and 98% discrimination for abnormal classes.

Estimation of Chlorophyll Contents in Pear Tree Using Unmanned AerialVehicle-Based-Hyperspectral Imagery (무인기 기반 초분광영상을 이용한 배나무 엽록소 함량 추정)

  • Ye Seong Kang;Ki Su Park;Eun Li Kim;Jong Chan Jeong;Chan Seok Ryu;Jung Gun Cho
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.669-681
    • /
    • 2023
  • Studies have tried to apply remote sensing technology, a non-destructive survey method, instead of the existing destructive survey, which requires relatively large labor input and a long time to estimate chlorophyll content, which is an important indicator for evaluating the growth of fruit trees. This study was conducted to non-destructively evaluate the chlorophyll content of pear tree leaves using unmanned aerial vehicle-based hyperspectral imagery for two years(2021, 2022). The reflectance of the single bands of the pear tree canopy extracted through image processing was band rationed to minimize unstable radiation effects depending on time changes. The estimation (calibration and validation) models were developed using machine learning algorithms of elastic-net, k-nearest neighbors(KNN), and support vector machine with band ratios as input variables. By comparing the performance of estimation models based on full band ratios, key band ratios that are advantageous for reducing computational costs and improving reproducibility were selected. As a result, for all machine learning models, when calibration of coefficient of determination (R2)≥0.67, root mean squared error (RMSE)≤1.22 ㎍/cm2, relative error (RE)≤17.9% and validation of R2≥0.56, RMSE≤1.41 ㎍/cm2, RE≤20.7% using full band ratios were compared, four key band ratios were selected. There was relatively no significant difference in validation performance between machine learning models. Therefore, the KNN model with the highest calibration performance was used as the standard, and its key band ratios were 710/714, 718/722, 754/758, and 758/762 nm. The performance of calibration showed R2=0.80, RMSE=0.94 ㎍/cm2, RE=13.9%, and validation showed R2=0.57, RMSE=1.40 ㎍/cm2, RE=20.5%. Although the performance results based on validation were not sufficient to estimate the chlorophyll content of pear tree leaves, it is meaningful that key band ratios were selected as a standard for future research. To improve estimation performance, it is necessary to continuously secure additional datasets and improve the estimation model by reproducing it in actual orchards. In future research, it is necessary to continuously secure additional datasets to improve estimation performance, verify the reliability of the selected key band ratios, and upgrade the estimation model to be reproducible in actual orchards.

An Object-Based Image Retrieval Techniques using the Interplay between Cortex and Hippocampus (해마와 피질의 상호 관계를 이용한 객체 기반 영상 검색 기법)

  • Hong Jong-Sun;Kang Dae-Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.4 s.304
    • /
    • pp.95-102
    • /
    • 2005
  • In this paper, we propose a user friendly object-based image retrieval system using the interaction between cortex and hippocampus. Most existing ways of queries in content-based image retrieval rely on query by example or query by sketch. But these methods of queries are not adequate to needs of people's various queries because they are not easy for people to use and restrict. We propose a method of automatic color object extraction using CSB tree map(Color and Spatial based Binary をn map). Extracted objects were transformed to bit stream representing information such as color, size and location by region labelling algorithm and they are learned by the hippocampal neural network using the interplay between cortex and hippocampus. The cells of exciting at peculiar features in brain generate the special sign when people recognize some patterns. The existing neural networks treat each attribute of features evenly. Proposed hippocampal neural network makes an adaptive fast content-based image retrieval system using excitatory learning method that forwards important features to long-term memories and inhibitory teaming method that forwards unimportant features to short-term memories controlled by impression.

A Comparative Study of Prediction Models for College Student Dropout Risk Using Machine Learning: Focusing on the case of N university (머신러닝을 활용한 대학생 중도탈락 위험군의 예측모델 비교 연구 : N대학 사례를 중심으로)

  • So-Hyun Kim;Sung-Hyoun Cho
    • Journal of The Korean Society of Integrative Medicine
    • /
    • v.12 no.2
    • /
    • pp.155-166
    • /
    • 2024
  • Purpose : This study aims to identify key factors for predicting dropout risk at the university level and to provide a foundation for policy development aimed at dropout prevention. This study explores the optimal machine learning algorithm by comparing the performance of various algorithms using data on college students' dropout risks. Methods : We collected data on factors influencing dropout risk and propensity were collected from N University. The collected data were applied to several machine learning algorithms, including random forest, decision tree, artificial neural network, logistic regression, support vector machine (SVM), k-nearest neighbor (k-NN) classification, and Naive Bayes. The performance of these models was compared and evaluated, with a focus on predictive validity and the identification of significant dropout factors through the information gain index of machine learning. Results : The binary logistic regression analysis showed that the year of the program, department, grades, and year of entry had a statistically significant effect on the dropout risk. The performance of each machine learning algorithm showed that random forest performed the best. The results showed that the relative importance of the predictor variables was highest for department, age, grade, and residence, in the order of whether or not they matched the school location. Conclusion : Machine learning-based prediction of dropout risk focuses on the early identification of students at risk. The types and causes of dropout crises vary significantly among students. It is important to identify the types and causes of dropout crises so that appropriate actions and support can be taken to remove risk factors and increase protective factors. The relative importance of the factors affecting dropout risk found in this study will help guide educational prescriptions for preventing college student dropout.

A study on automated soil moisture monitoring methods for the Korean peninsula based on Google Earth Engine (Google Earth Engine 기반의 한반도 토양수분 모니터링 자동화 기법 연구)

  • Jang, Wonjin;Chung, Jeehun;Lee, Yonggwan;Kim, Jinuk;Kim, Seongjoon
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.9
    • /
    • pp.615-626
    • /
    • 2024
  • To accurately and efficiently monitor soil moisture (SM) across South Korea, this study developed a SM estimation model that integrates the cloud computing platform Google Earth Engine (GEE) and Automated Machine Learning (AutoML). Various spatial information was utilized based on Terra MODIS (Moderate Resolution Imaging Spectroradiometer) and the global precipitation observation satellite GPM (Global Precipitation Measurement) to test optimal input data combinations. The results indicated that GPM-based accumulated dry-days, 5-day antecedent average precipitation, NDVI (Normalized Difference Vegetation Index), the sum of LST (Land Surface Temperature) acquired during nighttime and daytime, soil properties (sand and clay content, bulk density), terrain data (elevation and slope), and seasonal classification had high feature importance. After setting the objective function (Determination of coefficient, R2 ; Root Mean Square Error, RMSE; Mean Absolute Percent Error, MAPE) using AutoML for the combination of the aforementioned data, a comparative evaluation of machine learning techniques was conducted. The results revealed that tree-based models exhibited high performance, with Random Forest demonstrating the best performance (R2 : 0.72, RMSE: 2.70 vol%, MAPE: 0.14).

Traffic Sign Recognition using SVM and Decision Tree for Poor Driving Environment (SVM과 의사결정트리를 이용한 열악한 환경에서의 교통표지판 인식 알고리즘)

  • Jo, Young-Bae;Na, Won-Seob;Eom, Sung-Je;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.18 no.4
    • /
    • pp.485-494
    • /
    • 2014
  • Traffic Sign Recognition(TSR) is an important element in an Advanced Driver Assistance System(ADAS). However, many studies related to TSR approaches only in normal daytime environment because a sign's unique color doesn't appear in poor environment such as night time, snow, rain or fog. In this paper, we propose a new TSR algorithm based on machine learning for daytime as well as poor environment. In poor environment, traditional methods which use RGB color region doesn't show good performance. So we extracted sign characteristics using HoG extraction, and detected signs using a Support Vector Machine(SVM). The detected sign is recognized by a decision tree based on 25 reference points in a Normalized RGB system. The detection rate of the proposed system is 96.4% and the recognition rate is 94% when applied in poor environment. The testing was performed on an Intel i5 processor at 3.4 GHz using Full HD resolution images. As a result, the proposed algorithm shows that machine learning based detection and recognition methods can efficiently be used for TSR algorithm even in poor driving environment.

A study on the optimum cutter spacing ratio according to penetration depth using decision tree-based and SVM regressions (의사결정나무 기반 회귀분석과 SVM 회귀분석을 이용한 커터 관입깊이에 따른 최적 커터간격 비 연구)

  • Lee, Gi-Jun;Ryu, Hee-Hwan;Kwon, Tae-Hyuk
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.22 no.5
    • /
    • pp.501-513
    • /
    • 2020
  • Cutter cutting tests for the cutter placement in the cutter head are being conducted through various studies. Although the cutter spacing at the minimum specific energy is mainly reflected in the cutter head design, since the optimum cutter spacing at the same cutter penetration depth varies depending on the rock conditions, studies on deciding the optimum cutter spacing should be actively conducted. The machine learning techniques such as the decision tree-based regression model and the SVM regression model were applied to predict the optimum cutter spacing ratio for the nonlinear relationship between cutter penetration depth and cutter spacing. Since the decision tree-based methods are greatly influenced by the number of data, SVM regression predicted optimum cutter spacing ratio according to the penetration depth more accurately and it is judged that the SVM regression will be effectively used to decide the cutter spacing when designing the cutter head if a large amount of data of the optimum cutter spacing ratio according to the penetration depth is accumulated.

A Review of Machine Learning Algorithms for Fraud Detection in Credit Card Transaction

  • Lim, Kha Shing;Lee, Lam Hong;Sim, Yee-Wai
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.31-40
    • /
    • 2021
  • The increasing number of credit card fraud cases has become a considerable problem since the past decades. This phenomenon is due to the expansion of new technologies, including the increased popularity and volume of online banking transactions and e-commerce. In order to address the problem of credit card fraud detection, a rule-based approach has been widely utilized to detect and guard against fraudulent activities. However, it requires huge computational power and high complexity in defining and building the rule base for pattern matching, in order to precisely identifying the fraud patterns. In addition, it does not come with intelligence and ability in predicting or analysing transaction data in looking for new fraud patterns and strategies. As such, Data Mining and Machine Learning algorithms are proposed to overcome the shortcomings in this paper. The aim of this paper is to highlight the important techniques and methodologies that are employed in fraud detection, while at the same time focusing on the existing literature. Methods such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), naïve Bayesian, k-Nearest Neighbour (k-NN), Decision Tree and Frequent Pattern Mining algorithms are reviewed and evaluated for their performance in detecting fraudulent transaction.