• Title/Summary/Keyword: tree based learning

Search Result 435, Processing Time 0.025 seconds

A Study on the Prediction of Mortality Rate after Lung Cancer Diagnosis for Men and Women in 80s, 90s, and 100s Based on Deep Learning (딥러닝 기반 80대·90대·100대 남녀 대상 폐암 진단 후 사망률 예측에 관한 연구)

  • Kyung-Keun Byun;Doeg-Gyu Lee;Se-Young Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.87-96
    • /
    • 2023
  • Recently, research on predicting the treatment results of diseases using deep learning technology is also active in the medical community. However, small patient data and specific deep learning algorithms were selected and utilized, and research was conducted to show meaningful results under specific conditions. In this study, in order to generalize the research results, patients were further expanded and subdivided to derive the results of a study predicting mortality after lung cancer diagnosis for men and women in their 80s, 90s, and 100s. Using AutoML, which provides large-scale medical information and various deep learning algorithms from the Health Insurance Review and Assessment Service, five algorithms such as Decision Tree, Random Forest, Gradient Boosting, XGBoost, and Logistic Registration were created to predict mortality rates for 84 months after lung cancer diagnosis. As a result of the study, men in their 80s and 90s had a higher mortality prediction rate than women, and women in their 100s had a higher mortality prediction rate than men. And the factor that has the greatest influence on the mortality rate was analyzed as the treatment period.

A Study on Pre-evaluation of Tree Species Classification Possibility of CAS500-4 Using RapidEye Satellite Imageries (농림위성 활용 수종분류 가능성 평가를 위한 래피드아이 영상 기반 시험 분석)

  • Kwon, Soo-Kyung;Kim, Kyoung-Min;Lim, Joongbin
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.291-304
    • /
    • 2021
  • Updating a forest type map is essential for sustainable forest resource management and monitoring to cope with climate change and various environmental problems. According to the necessity of efficient and wide-area forestry remote sensing, CAS500-4 (Compact Advanced Satellite 500-4; The agriculture and forestry satellite) project has been confirmed and scheduled for launch in 2023. Before launching and utilizing CAS500-4, this study aimed to pre-evaluation the possibility of satellite-based tree species classification using RapidEye, which has similar specifications to the CAS500-4. In this study, the study area was the Chuncheon forest management complex, Gangwon-do. The spectral information was extracted from the growing season image. And the GLCM texture information was derived from the growing and non-growing seasons NIR bands. Both information were used to classification with random forest machine learning method. In this study, tree species were classified into nine classes to the coniferous tree (Korean red pine, Korean pine, Japanese larch), broad-leaved trees (Mongolian oak, Oriental cork oak, East Asian white birch, Korean Castanea, and other broad-leaved trees), and mixed forest. Finally, the classification accuracy was calculated by comparing the forest type map and classification results. As a result, the accuracy was 39.41% when only spectral information was used and 69.29% when both spectral information and texture information was used. For future study, the applicability of the CAS500-4 will be improved by substituting additional variables that more effectively reflect vegetation's ecological characteristics.

Prediction of Safety Grade of Bridges Using the Classification Models of Decision Tree and Random Forest (의사결정나무 및 랜덤포레스트 분류 모델을 이용한 교량 안전등급 예측)

  • Hong, Jisu;Jeon, Se-Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.397-411
    • /
    • 2023
  • The number of deteriorated bridges with a service period of more than 30 years has been rapidly increasing in Korea. Accordingly, the importance of advanced maintenance technologies through the predictions of age-induced deterioration degree, condition, and performance of bridges is more and more noticed. The prediction method of the safety grade of bridges was proposed in this study using the classification models of the Decision Tree and the Random Forest based on machine learning. As a result of analyzing these models for the 8,850 bridges located in national roads with various evaluation indexes such as confusion matrix, balanced accuracy, recall, ROC curve, and AUC, the Random Forest largely showed better predictive performance than that of the Decision Tree. In particular, random under-sampling in the Random Forest showed higher predictive performance than that of other sampling techniques for the C and D grade bridges, with the recall of 83.4%, which need more attention to maintenance because of the significant deterioration degree. The proposed model can be usefully applied to rapidly identify the safety grade and to establish an efficient and economical maintenance plan of bridges that have not recently been inspected.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

Committee Learning Classifier based on Attribute Value Frequency (속성 값 빈도 기반의 전문가 다수결 분류기)

  • Lee, Chang-Hwan;Jung, In-Chul;Kwon, Young-S.
    • Journal of KIISE:Databases
    • /
    • v.37 no.4
    • /
    • pp.177-184
    • /
    • 2010
  • In these day, many data including sensor, delivery, credit and stock data are generated continuously in massive quantity. It is difficult to learn from these data because they are large in volume and changing fast in their concepts. To handle these problems, learning methods based in sliding window methods over time have been used. But these approaches have a problem of rebuilding models every time new data arrive, which requires a lot of time and cost. Therefore we need very simple incremental learning methods. Bayesian method is an example of these methods but it has a disadvantage which it requries the prior knowledge(probabiltiy) of data. In this study, we propose a learning method based on attribute values. In the proposed method, even though we don't know the prior knowledge(probability) of data, we can apply our new method to data. The main concept of this method is that each attribute value is regarded as an expert learner, summing up the expert learners lead to better results. Experimental results show our learning method learns from data very fast and performs well when compared to current learning methods(decision tree and bayesian).

Fast Partition Decision Using Rotation Forest for Intra-Frame Coding in HEVC Screen Content Coding Extension (회전 포레스트 분류기법을 이용한 HEVC 스크린 콘텐츠 화면 내 부호화 조기분할 결정 방법)

  • Heo, Jeonghwan;Jeong, Jechang
    • Journal of Broadcast Engineering
    • /
    • v.23 no.1
    • /
    • pp.115-125
    • /
    • 2018
  • This paper presents a fast partition decision framework for High Efficiency Video Coding (HEVC) Screen Content Coding (SCC) based on machine learning. Currently, the HEVC performs quad-tree block partitioning process to achieve optimal coding efficiency. Since this process requires a high computational complexity of the encoding device, the fast encoding process has been studied as determining the block structure early. However, in the case of the screen content video coding, it is difficult to apply the conventional early partition decision method because it shows different partition characteristics from natural content. The proposed method solves the problem by classifying the screen content blocks after partition decision, and it shows an increase of 3.11% BD-BR and 42% time reduction compared to the SCC common test condition.

Predicting Employment Earning using Deep Convolutional Neural Networks (딥 컨볼루션 신경망을 이용한 고용 소득 예측)

  • Ramadhani, Adyan Marendra;Kim, Na-Rang;Choi, Hyung-Rim
    • Journal of Digital Convergence
    • /
    • v.16 no.6
    • /
    • pp.151-161
    • /
    • 2018
  • Income is a vital aspect of economic life. Knowing what their income will help people create budgets that allow them to pay for their living expenses. Income data is used by banks, stores, and service companies for marketing purposes and for retaining loyal customers; it is a crucial demographic element used at a wide variety of customer touch points. Therefore, it is essential to be able to make income predictions for existing and potential customers. This paper aims to predict employment earnings or income based on history, and uses machine learning techniques such as SVMs (Support Vector Machines), Gaussian, decision tree and DCNNs (Deep Convolutional Neural Networks) for predicting employment earnings. The results show that the DCNN method provides optimum results with 88% compared to other machine learning techniques used in this paper. Improvement of the data length such PCA has the potential to provide more optimum result.

User Adaptation Using User Model in Intelligent Image Retrieval System (지능형 화상 검색 시스템에서의 사용자 모델을 이용한 사용자 적응)

  • Kim, Yong-Hwan;Rhee, Phill-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3559-3568
    • /
    • 1999
  • The information overload with many information resources is an inevitable problem in modern electronic life. It is more difficult to search some information with user's information needs from an uncontrolled flood of many digital information resources, such as the internet which has been rapidly increased. So, many information retrieval systems have been researched and appeared. In text retrieval systems, they have met with user's information needs. While, in image retrieval systems, they have not properly dealt with user's information needs. In this paper, for resolving this problem, we proposed the intelligent user interface for image retrieval. It is based on HCOS(Human-Computer Symmetry) model which is a layed interaction model between a human and computer. Its' methodology is employed to reduce user's information overhead and semantic gap between user and systems. It is implemented with machine learning algorithms, decision tree and backpropagation neural network, for user adaptation capabilities of intelligent image retrieval system(IIRS).

  • PDF

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest (랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.57-77
    • /
    • 2019
  • Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100~1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).