• Title/Summary/Keyword: tree based learning

Search Result 435, Processing Time 0.03 seconds

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.2
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Development and Testing of a Machine Learning Model Using 18F-Fluorodeoxyglucose PET/CT-Derived Metabolic Parameters to Classify Human Papillomavirus Status in Oropharyngeal Squamous Carcinoma

  • Changsoo Woo;Kwan Hyeong Jo;Beomseok Sohn;Kisung Park;Hojin Cho;Won Jun Kang;Jinna Kim;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.24 no.1
    • /
    • pp.51-61
    • /
    • 2023
  • Objective: To develop and test a machine learning model for classifying human papillomavirus (HPV) status of patients with oropharyngeal squamous cell carcinoma (OPSCC) using 18F-fluorodeoxyglucose (18F-FDG) PET-derived parameters in derived parameters and an appropriate combination of machine learning methods in patients with OPSCC. Materials and Methods: This retrospective study enrolled 126 patients (118 male; mean age, 60 years) with newly diagnosed, pathologically confirmed OPSCC, that underwent 18F-FDG PET-computed tomography (CT) between January 2012 and February 2020. Patients were randomly assigned to training and internal validation sets in a 7:3 ratio. An external test set of 19 patients (16 male; mean age, 65.3 years) was recruited sequentially from two other tertiary hospitals. Model 1 used only PET parameters, Model 2 used only clinical features, and Model 3 used both PET and clinical parameters. Multiple feature transforms, feature selection, oversampling, and training models are all investigated. The external test set was used to test the three models that performed best in the internal validation set. The values for area under the receiver operating characteristic curve (AUC) were compared between models. Results: In the external test set, ExtraTrees-based Model 3, which uses two PET-derived parameters and three clinical features, with a combination of MinMaxScaler, mutual information selection, and adaptive synthetic sampling approach, showed the best performance (AUC = 0.78; 95% confidence interval, 0.46-1). Model 3 outperformed Model 1 using PET parameters alone (AUC = 0.48, p = 0.047) and Model 2 using clinical parameters alone (AUC = 0.52, p = 0.142) in predicting HPV status. Conclusion: Using oversampling and mutual information selection, an ExtraTree-based HPV status classifier was developed by combining metabolic parameters derived from 18F-FDG PET/CT and clinical parameters in OPSCC, which exhibited higher performance than the models using either PET or clinical parameters alone.

Sleep Deprivation Attack Detection Based on Clustering in Wireless Sensor Network (무선 센서 네트워크에서 클러스터링 기반 Sleep Deprivation Attack 탐지 모델)

  • Kim, Suk-young;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.1
    • /
    • pp.83-97
    • /
    • 2021
  • Wireless sensors that make up the Wireless Sensor Network generally have extremely limited power and resources. The wireless sensor enters the sleep state at a certain interval to conserve power. The Sleep deflation attack is a deadly attack that consumes power by preventing wireless sensors from entering the sleep state, but there is no clear countermeasure. Thus, in this paper, using clustering-based binary search tree structure, the Sleep deprivation attack detection model is proposed. The model proposed in this paper utilizes one of the characteristics of both attack sensor nodes and normal sensor nodes which were classified using machine learning. The characteristics used for detection were determined using Long Short-Term Memory, Decision Tree, Support Vector Machine, and K-Nearest Neighbor. Thresholds for judging attack sensor nodes were then learned by applying the SVM. The determined features were used in the proposed algorithm to calculate the values for attack detection, and the threshold for determining the calculated values was derived by applying SVM.Through experiments, the detection model proposed showed a detection rate of 94% when 35% of the total sensor nodes were attack sensor nodes and improvement of up to 26% in power retention.

A biota research and analysis for Close-to-nature stream restoration planning (자연형 하천복원계획 수립을 위한 생물상 조사 및 분석)

  • SaGong, Jung-Hee;Ryu, Yeon-Su;Ra, Jung-Hwa
    • Current Research on Agriculture and Life Sciences
    • /
    • v.24
    • /
    • pp.37-42
    • /
    • 2006
  • The purpose of this study was a biota research and analysis for Close-to-nature stream restoration planning of Shinchun. The summary of this study is as follows; 1) The vascular plants in research area recorded of 45 species and insect fauna recorded of 34 species of 8 orders. As a result of table of community classification, the communities were two group; Quercus variabilis community(I), Pinus densiflora-Quercus variabilis-Quercus dentata community(II). 2) As a result of analysis on correlation of tree species, the level of significance in positive correlation between Quercus dentata and Corylus heterophyll aindicated 1% and between Pinus densiflora and Lespedeza bicolor also indicated 1%. 3) As a result of DBH analysis, it is expected that Quercus variabilis and Quercus dentata will dominateover other species in competition and its succession continuously maintains from now on in community I. In community II, it is assumed that there is a high possibility of changing into community of Quercus such as Quercus mongolica, Quercus dentata, and Quercus variabilis. 4) As a result of analysis on insect fauna, insect fauna consists of 94% of whole species as 32 species, 23 families, 8 orders. And 7 species, 7 families 4 orders was found in highly urbanized area, the vicinity of Sang-Dong bridge. 5) As mentioned above, Based on A biota fundamental research, Close-to-nature stream restoration planning were full of suggestions: i) Designating ecosystem preservation area, ii) Making Close-to-nature stream revetments, iii) Making pool-and-riffle, vi) Making decks for observation and walks for nature experience, v) Creating wetland biotope. Through these methods, it is necessary to promote bio-diversity and lead people to the space for eco-learning.

  • PDF

Loaming Syntactic Constraints for Improving the Efficiency of Korean Parsing (한국어 구문분석의 효율성을 개선하기 위한 구문제약규칙의 학습)

  • Park, So-Young;Kwak, Yong-Jae;Chung, Hoo-Jung;Hwang, Young-Sook;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.10
    • /
    • pp.755-765
    • /
    • 2002
  • In this paper, we observe various syntactic information for Korean parsing and propose a method to learn constraints and improve the efficiency of a parsing model by using the constraints. The proposed method has the following three characteristics. First, it improves the parsing efficiency since we use constraints that can prevent the parser from generating unsuitable candidates. Second, it is robust on a given Korean sentence because the attributes for the constraints are selected based on the syntactic and lexical idiosyncrasy of Korean. Third, it is easy to acquire constraints automatically from a treebank by using a decision tree learning algorithm. The experimental results show that the parser using acquired constraints can reduce the number of overgenerated candidates up to 1/2~1/3 of candidates and it runs 2~3 times faster than the one without any constraints.

A Personalized Hand Gesture Recognition System using Soft Computing Techniques (소프트 컴퓨팅 기법을 이용한 개인화된 손동작 인식 시스템)

  • Jeon, Moon-Jin;Do, Jun-Hyeong;Lee, Sang-Wan;Park, Kwang-Hyun;Bien, Zeung-Nam
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.53-59
    • /
    • 2008
  • Recently, vision-based hand gesture recognition techniques have been developed for assisting elderly and disabled people to control home appliances. Frequently occurred problems which lower the hand gesture recognition rate are due to the inter-person variation and intra-person variation. The recognition difficulty caused by inter-person variation can be handled by using user dependent model and model selection technique. And the recognition difficulty caused by intra-person variation can be handled by using fuzzy logic. In this paper, we propose multivariate fuzzy decision tree learning and classification method for a hand motion recognition system for multiple users. When a user starts to use the system, the most appropriate recognition model is selected and used for the user.

A Best Effort Classification Model For Sars-Cov-2 Carriers Using Random Forest

  • Mallick, Shrabani;Verma, Ashish Kumar;Kushwaha, Dharmender Singh
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.27-33
    • /
    • 2021
  • The whole world now is dealing with Coronavirus, and it has turned to be one of the most widespread and long-lived pandemics of our times. Reports reveal that the infectious disease has taken toll of the almost 80% of the world's population. Amidst a lot of research going on with regards to the prediction on growth and transmission through Symptomatic carriers of the virus, it can't be ignored that pre-symptomatic and asymptomatic carriers also play a crucial role in spreading the reach of the virus. Classification Algorithm has been widely used to classify different types of COVID-19 carriers ranging from simple feature-based classification to Convolutional Neural Networks (CNNs). This research paper aims to present a novel technique using a Random Forest Machine learning algorithm with hyper-parameter tuning to classify different types COVID-19-carriers such that these carriers can be accurately characterized and hence dealt timely to contain the spread of the virus. The main idea for selecting Random Forest is that it works on the powerful concept of "the wisdom of crowd" which produces ensemble prediction. The results are quite convincing and the model records an accuracy score of 99.72 %. The results have been compared with the same dataset being subjected to K-Nearest Neighbour, logistic regression, support vector machine (SVM), and Decision Tree algorithms where the accuracy score has been recorded as 78.58%, 70.11%, 70.385,99% respectively, thus establishing the concreteness and suitability of our approach.

Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores (시뮬레이티드 어니일링 기반의 랜덤 포레스트를 이용한 기업부도예측)

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.155-170
    • /
    • 2018
  • Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.