• Title/Summary/Keyword: random forests

Search Result 107, Processing Time 0.025 seconds

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

  • Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.79-89
    • /
    • 2016
  • In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.

Novel two-stage hybrid paradigm combining data pre-processing approaches to predict biochemical oxygen demand concentration (생물화학적 산소요구량 농도예측을 위하여 데이터 전처리 접근법을 결합한 새로운 이단계 하이브리드 패러다임)

  • Kim, Sungwon;Seo, Youngmin;Zakhrouf, Mousaab;Malik, Anurag
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1037-1051
    • /
    • 2021
  • Biochemical oxygen demand (BOD) concentration, one of important water quality indicators, is treated as the measuring item for the ecological chapter in lakes and rivers. This investigation employed novel two-stage hybrid paradigm (i.e., wavelet-based gated recurrent unit, wavelet-based generalized regression neural networks, and wavelet-based random forests) to predict BOD concentration in the Dosan and Hwangji stations, South Korea. These models were assessed with the corresponding independent models (i.e., gated recurrent unit, generalized regression neural networks, and random forests). Diverse water quality and quantity indicators were implemented for developing independent and two-stage hybrid models based on several input combinations (i.e., Divisions 1-5). The addressed models were evaluated using three statistical indices including the root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and correlation coefficient (CC). It can be found from results that the two-stage hybrid models cannot always enhance the predictive precision of independent models confidently. Results showed that the DWT-RF5 (RMSE = 0.108 mg/L) model provided more accurate prediction of BOD concentration compared to other optimal models in Dosan station, and the DWT-GRNN4 (RMSE = 0.132 mg/L) model was the best for predicting BOD concentration in Hwangji station, South Korea.

A measure of discrepancy based on margin of victory useful for the determination of random forest size (랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.515-524
    • /
    • 2017
  • In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.

Covariance-based Recognition Using Machine Learning Model

  • Osman, Hassab Elgawi
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.223-228
    • /
    • 2009
  • We propose an on-line machine learning approach for object recognition, where new images are continuously added and the recognition decision is made without delay. Random forest (RF) classifier has been extensively used as a generative model for classification and regression applications. We extend this technique for the task of building incremental component-based detector. First we employ object descriptor model based on bag of covariance matrices, to represent an object region then run our on-line RF learner to select object descriptors and to learn an object classifier. Experiments of the object recognition are provided to verify the effectiveness of the proposed approach. Results demonstrate that the propose model yields in object recognition performance comparable to the benchmark standard RF, AdaBoost, and SVM classifiers.

  • PDF

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.2
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

A Novel Network Anomaly Detection Method based on Data Balancing and Recursive Feature Addition

  • Liu, Xinqian;Ren, Jiadong;He, Haitao;Wang, Qian;Sun, Shengting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.3093-3115
    • /
    • 2020
  • Network anomaly detection system plays an essential role in detecting network anomaly and ensuring network security. Anomaly detection system based machine learning has become an increasingly popular solution. However, due to the unbalance and high-dimension characteristics of network traffic, the existing methods unable to achieve the excellent performance of high accuracy and low false alarm rate. To address this problem, a new network anomaly detection method based on data balancing and recursive feature addition is proposed. Firstly, data balancing algorithm based on improved KNN outlier detection is designed to select part respective data on each category. Combination optimization about parameters of improved KNN outlier detection is implemented by genetic algorithm. Next, recursive feature addition algorithm based on correlation analysis is proposed to select effective features, in which a cross contingency test is utilized to analyze correlation and obtain a features subset with a strong correlation. Then, random forests model is as the classification model to detection anomaly. Finally, the proposed algorithm is evaluated on benchmark datasets KDD Cup 1999 and UNSW_NB15. The result illustrates the proposed strategies enhance accuracy and recall, and decrease the false alarm rate. Compared with other algorithms, this algorithm still achieves significant effects, especially recall in the small category.

Pattern and Association within Shrub Layer under Summer Green Forest in Central Korean Peninsula (중부한국의 하록림 밑 관목층 구성종의 미분포와 종간상관)

  • 오계칠
    • Journal of Plant Biology
    • /
    • v.15 no.1
    • /
    • pp.33-41
    • /
    • 1972
  • Nine shrub layer communities under two relatively well conserved natural summer green forests in the central region of Korean Peninsula were studied for the pattern of stem distribution in terms of Greig-Smith's multiple split-plot experiment and for the association between the population of the two main species in terms of Kershaw's covariance analysis respectively. Four contiguous belt transects, $4{\times}64m size with 1{\times}1m$ basic unit, were set in each shrub layer communities. Significant primary clumps with $1{\times}1m or 1{\times}2m$ dimension wer observed consistently throughout the nine study sites. The primary clumps themselves were significantly distributed either regularly or at random. The association between the two principal species of each shrub layer is highly significantly either positive or negative in $1{\times}1m or 1{\times}2m$ dimension. As the plot size increases from $1{\times}1m to 8{\times}8m$ the associational trends were changed from negative to positive direction in one forests. But the change from positive to negative direction and the consistent negative association were also observed from the other forest. All of the association trends were observed only from $1{\times}1m to 4{\times}4m$ dimension. These results are suggestive that the distributional pattern of the shrub layer species under the summer green forest is simple mosaic fashioned with $1{\times}1m or 1{\times}2m$ dimension. The rest of the principal species are located in that matrix. The simple mosaic pattern of two principal species are located in that matrix. The simple mosaic pattern of two principal species seems to be controlled by change in micro-environmental pattern. Differences between the primary random group and clumped group among sites also suggest that competition exists for light or/and soil between primary clumped groups.

  • PDF

Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models

  • Kim, Hyunsuk;Park, Taesung;Jang, Jinyoung;Lee, Seungyeoun
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.23.1-23.9
    • /
    • 2022
  • A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

A descriptive study of on-farm biosecurity and management practices during the incursion of porcine epidemic diarrhea into Canadian swine herds, 2014

  • Perri, Amanda M.;Poljak, Zvonimir;Dewey, Cate;Harding, John CS.;O'Sullivan, Terri L.
    • Journal of Veterinary Science
    • /
    • v.21 no.2
    • /
    • pp.25.1-25.16
    • /
    • 2020
  • Porcine epidemic diarrhea virus (PEDV) emerged into Canada in January 2014, primarily affecting sow herds. Subsequent epidemiological analyses suggested contaminated feed was the most likely transmission pathway. The primary objective of this study was to describe general biosecurity and management practices implemented in PEDV-positive sow herds and matched control herds at the time the virus emerged. The secondary objective was to determine if any of these general biosecurity and farm management practices were important in explaining PEDV infection status from January 22, 2014 to March 1, 2014. A case herd was defined as a swine herd with clinical signs and a positive test result for PEDV. A questionnaire was used to a gather 30-day history of herd management practices, animal movements on/off site, feed management practices, semen deliveries and biosecurity practices for case (n = 8) and control (n = 12) herds, primarily located in Ontario. Data was analyzed using descriptive statistics and random forests (RFs). Case herds were larger in size than control herds. Case herds had more animal movements and non-staff movements onto the site. Also, case herds had higher quantities of pigs delivered, feed deliveries and semen deliveries on-site. The biosecurity practices of case herds were considered more rigorous based on herd management, feed deliveries, transportation and truck driver practices than control herds. The RF model found that the most important variables for predicting herd status were related to herd size and feed management variables. Nonetheless, predictive accuracy of the final RF model was 72%.

Stand Structure and Regeneration Pattern of Kalopanax septemlobus at the Natural Deciduous Broad-leaved Forest in Mt. Jeombong, Korea

  • Kang, Ho-Sang;Lee, Don-Koo
    • Journal of Ecology and Environment
    • /
    • v.29 no.1
    • /
    • pp.17-22
    • /
    • 2006
  • Since the demands not only for value-added timber but the environmental functions of forests had been increased, native tree species has been, and is rapidly being replaced by foreign tree species in many parts of the world. However, the studies on population structure and regeneration characteristics of native tree species were not conducted enough. Regeneration of Kalopanax septemlobus growing among other hardwoods in natural forests is very difficult because of its low seed viability and germination rate. The study examined the distribution of mature trees of K. septemlobus and their regeneration pattern at the 1.12 ha study plot in natural deciduous broad-leaved forest of Mt. Jeombong. The density and mean DBH of K. septemlobus was 97 trees per ha and 32 cm, respectively. The spatial distribution of K. septemlobus showed a random pattern (aggregation index is 0.935) in the 1.12 ha study plot. The age of 90 trees among 99 sample trees of K. septemlobus ranged from 90 to 110 years and represented a single cohort, thus suggesting that K. septemlobus in advance regeneration has regenerated as a result of disturbances such as canopy opening.