• Title/Summary/Keyword: random forest model

Search Result 532, Processing Time 0.028 seconds

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Relationships between Fish Communities and Environmental Variables in Islands, South Korea

  • Kwon, Yong-Su;Shin, Man-Seok;Yoon, Hee-Nam
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.3 no.2
    • /
    • pp.84-96
    • /
    • 2022
  • Most of the islands of Korea are distributed in the South and West Sea, and it consists of independent small stream. As a result, the fish community that inhabits the island's stream is isolated from the mainland and other island. This study utilized a Self-Organizing Map (SOM) and a random forest model to analyze the relationship between environmental variables and fish communities inhabiting islands in South Korea. Through the SOM analysis, the fish communities were divided into three clusters, and there were differences in biotic and abiotic factors between these groups. Cluster I consisted of sites with relatively larger island areas and a higher number of species and population. It was found that 15 out of 16 indicator species were included. Meanwhile, the remaining clusters had fewer species and populations. Cluster II, especially, showed the lowest impact from physical variables such as water width and depth. As a result of predicting the species richness using the random forest model, physical variables in habitats, such as stream width and water depth, had a relatively higher importance on species richness. On the other hand, forest area was the most important variables for predicting Shannon diversity, followed by maximum water depth, and gravel. The results suggest that this study can be used as basic data for establishing a stream ecosystem management strategy in terms of conservation and protection of biological resources in streams of islands.

Deep Learning based Scrapbox Accumulated Status Measuring

  • Seo, Ye-In;Jeong, Eui-Han;Kim, Dong-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.27-32
    • /
    • 2020
  • In this paper, we propose an algorithm to measure the accumulated status of scrap boxes where metal scraps are accumulated. The accumulated status measuring is defined as a multi-class classification problem, and the method with deep learning classify the accumulated status using only the scrap box image. The learning was conducted by the Transfer Learning method, and the deep learning model was NASNet-A. In order to improve the accuracy of the model, we combined the Random Forest classifier with the trained NASNet-A and improved the model through post-processing. Testing with 4,195 data collected in the field showed 55% accuracy when only NASNet-A was applied, and the proposed method, NASNet with Random Forest, improved the accuracy by 88%.

A Predictive Model to identify possible affected Bipolar disorder students using Naive Baye's, Random Forest and SVM machine learning techniques of data mining and Building a Sequential Deep Learning Model using Keras

  • Peerbasha, S.;Surputheen, M. Mohamed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.5
    • /
    • pp.267-274
    • /
    • 2021
  • Medical care practices include gathering a wide range of student data that are with manic episodes and depression which would assist the specialist with diagnosing a health condition of the students correctly. In this way, the instructors of the specific students will also identify those students and take care of them well. The data which we collected from the students could be straightforward indications seen by them. The artificial intelligence has been utilized with Naive Baye's classification, Random forest classification algorithm, SVM algorithm to characterize the datasets which we gathered to check whether the student is influenced by Bipolar illness or not. Performance analysis of the disease data for the algorithms used is calculated and compared. Also, a sequential deep learning model is builded using Keras. The consequences of the simulations show the efficacy of the grouping techniques on a dataset, just as the nature and complexity of the dataset utilized.

Application of Multi-Layer Perceptron and Random Forest Method for Cylinder Plate Forming (Multi-Layer Perceptron과 Random Forest를 이용한 실린더 판재의 성형 조건 예측)

  • Kim, Seong-Kyeom;Hwang, Se-Yun;Lee, Jang-Hyun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.57 no.5
    • /
    • pp.297-304
    • /
    • 2020
  • In this study, the prediction method was reviewed to process a cylindrical plate forming using machine learning as a data-driven approach by roll bending equipment. The calculation of the forming variables was based on the analysis using the mechanical relationship between the material properties and the roll bending machine in the bending process. Then, by applying the finite element analysis method, the accuracy of the deformation prediction model was reviewed, and a large number data set was created to apply to machine learning using the finite element analysis model for deformation prediction. As a result of the application of the machine learning model, it was confirmed that the calculation is slightly higher than the linear regression method. Applicable results were confirmed through the machine learning method.

The road roughness based Braking Pressure Calculation System(BPCS) for an Autonomous Vehicle Stability (자율차량 안정성을 위한 도로 거칠기 기반 제동압력 계산 시스템)

  • Son, Su-Rak;Lee, Byung-Kwan;Sim, Son-Kweon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.5
    • /
    • pp.323-330
    • /
    • 2020
  • This paper proposes the road roughness based Braking Pressure Calculation System(BPCS) for an Autonomous Vehicle Stability. The system consists of an image normalization module that processes the front image of a vehicle to fit the input of the random forest, a Random Forest based Road Roughness Classification Module that distinguish the roughness of the road on which the vehicle is travelling by using the weather information and the front image of a vehicle as an input, and a brake pressure control module that modifies a friction coefficient applied to the vehicle according to the road roughness and determines the braking strength to maintain optimal driving according to a vehicle ahead. To verify the efficiency of the BPCS experiment was conducted with a random forest model. The result of the experiment shows that the accuracy of the random forest model was about 2% higher than that of the SVM, and that 7 features should be bagged to make an accurate random forest model. Therefore, the BPCS satisfies both real-time and accuracy in situations where the vehicle needs to brake.

Development of Random Forest Model for Sewer-induced Sinkhole Susceptibility (손상 하수관으로 인한 지반함몰의 위험도 평가를 위한 랜덤 포레스트 모델 개발)

  • Kim, Joonyoung;Kang, Jae Mo;Baek, Sung-Ha
    • Journal of the Korean Geotechnical Society
    • /
    • v.37 no.12
    • /
    • pp.117-125
    • /
    • 2021
  • The occurrence of ground subsidence and sinkhole in downtown areas, which threatens the safety of citizens, has been frequently reported. Among the various mechanisms of a sinkhole, soil erosion through the damaged part of the sewer pipe was found to be the main cause in Seoul. In this study, a random forest model for predicting the occurrence of sinkholes caused by damaged sewer pipes based on sewage pipe information was trained using the information on the sewage pipe and the locations of the sinkhole occurrence case in Seoul. The random forest model showed excellent performance in the prediction of sinkhole occurrence after the optimization of its hyperparameters. In addition, it was confirmed that the sewage pipe length, elevation above sea level, slope, depth of landfill, and the risk of ground subsidence were affected in the order of sewage pipe information used as input variables. The results of this study are expected to be used as basic data for the preparation of a sinkhole susceptibility map and the establishment of an underground cavity exploration plan and a sewage pipe maintenance plan.

Activity Type Detection Of Random Forest Model Using UWB Radar And Indoor Environmental Measurement Sensor (UWB 레이더와 실내 환경 측정 센서를 이용한 랜덤 포레스트 모델의 재실활동 유형 감지)

  • Park, Jin Su;Jeong, Ji Seong;Yang, Chul Seung;Lee, Jeong Gi
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.899-904
    • /
    • 2022
  • As the world becomes an aging society due to a decrease in the birth rate and an increase in life expectancy, a system for health management of the elderly population is needed. Among them, various studies on occupancy and activity types are being conducted for smart home care services for indoor health management. In this paper, we propose a random forest model that classifies activity type as well as occupancy status through indoor temperature and humidity, CO2, fine dust values and UWB radar positioning for smart home care service. The experiment measures indoor environment and occupant positioning data at 2-second intervals using three sensors that measure indoor temperature and humidity, CO2, and fine dust and two UWB radars. The measured data is divided into 80% training set data and 20% test set data after correcting outliers and missing values, and the random forest model is applied to evaluate the list of important variables, accuracy, sensitivity, and specificity.

A Prediction Model for the Development of Cataract Using Random Forests (Random Forests 기법을 이용한 백내장 예측모형 - 일개 대학병원 건강검진 수검자료에서 -)

  • Han, Eun-Jeong;Song, Ki-Jun;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.771-780
    • /
    • 2009
  • Cataract is the main cause of blindness and visual impairment, especially, age-related cataract accounts for about half of the 32 million cases of blindness worldwide. As the life expectancy and the expansion of the elderly population are increasing, the cases of cataract increase as well, which causes a serious economic and social problem throughout the country. However, the incidence of cataract can be reduced dramatically through early diagnosis and prevention. In this study, we developed a prediction model of cataracts for early diagnosis using hospital data of 3,237 subjects who received the screening test first and then later visited medical center for cataract check-ups cataract between 1994 and 2005. To develop the prediction model, we used random forests and compared the predictive performance of this model with other common discriminant models such as logistic regression, discriminant model, decision tree, naive Bayes, and two popular ensemble model, bagging and arcing. The accuracy of random forests was 67.16%, sensitivity was 72.28%, and main factors included in this model were age, diabetes, WBC, platelet, triglyceride, BMI and so on. The results showed that it could predict about 70% of cataract existence by screening test without any information from direct eye examination by ophthalmologist. We expect that our model may contribute to diagnose cataract and help preventing cataract in early stages.

Human Action Recognition in Still Image Using Weighted Bag-of-Features and Ensemble Decision Trees (가중치 기반 Bag-of-Feature와 앙상블 결정 트리를 이용한 정지 영상에서의 인간 행동 인식)

  • Hong, June-Hyeok;Ko, Byoung-Chul;Nam, Jae-Yeal
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper propose a human action recognition method that uses bag-of-features (BoF) based on CS-LBP (center-symmetric local binary pattern) and a spatial pyramid in addition to the random forest classifier. To construct the BoF, an image divided into dense regular grids and extract from each patch. A code word which is a visual vocabulary, is formed by k-means clustering of a random subset of patches. For enhanced action discrimination, local BoF histogram from three subdivided levels of a spatial pyramid is estimated, and a weighted BoF histogram is generated by concatenating the local histograms. For action classification, a random forest, which is an ensemble of decision trees, is built to model the distribution of each action class. The random forest combined with the weighted BoF histogram is successfully applied to Standford Action 40 including various human action images, and its classification performance is better than that of other methods. Furthermore, the proposed method allows action recognition to be performed in near real-time.