• 제목/요약/키워드: Determination of random forest size

검색결과 7건 처리시간 0.02초

랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도 (A measure of discrepancy based on margin of victory useful for the determination of random forest size)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.515-524
    • /
    • 2017
  • 이 연구에서는 분류를 위한 RF (random forest)의 크기 결정에 유용한 승리표차 MV (margin of victory)에 기반한 불일치 측도를 제안하고자 한다. 여기서 MV는 현재의 RF에서 1등과 2등을 차지하는 집단이 무한 RF에서 차지하는 승리표차이다. 구체적으로 -MV가 양수이면 현재와 무한 RF 사이에 1등과 2등인 집단에서 불일치가 생긴다는 점에 착안하여, max(-MV, 0)을 하나의 불일치 측도로 제안한다. 이 불일치 측도에 근거하여 RF의 크기 결정에 적절한 진단통계량을 제안하며, 또한 이 통계량의 이론적인 점근분포를 유도한다. 마지막으로 이 통계량을 최근에 제안된 진단통계량들과 소표본 하에서 성능을 비교하는 모의실험을 실행한다.

랜덤포레스트의 크기 결정을 위한 간편 진단통계량 (A simple diagnostic statistic for determining the size of random forest)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권4호
    • /
    • pp.855-863
    • /
    • 2016
  • 이 연구에서는 RF (random forest)의 크기 결정을 위한 간편 진단통계량을 제안한다. 이 방법은 현재까지 생성된 의사결정나무의 1등과 2등인 집단이 무한히 생성된 의사결정나무에서 차지하는 승리표차인 MV (margin of victory)에 근거한다. 따라서 MV가 음수이면 현재의 RF와 무한 RF 사이에 괴리가 생기는 것을 의미한다. 이 연구에서 제안하는 방법은 -MV가 고정된 작은 양수 (예를 들면 0.03)보다 큰 개체의 비율에 근거한다. 이 방법에 의한 적절한 통계량 도출과 함께 이 통계량의 이론적인 분포를 유도한다. 또한 최근에 제안된 진단통계량과 성능을 비교하는 모의실험을 수행한다.

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권1호
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

Current Status and Potentiality of Forest Resources in a Proposed Biodiversity Conservation Area of Bangladesh

  • Rana, Md. Parvez;Uddin, Mohammed Salim;Chowdhury, Mohammad Shaheed Hossain;Sohel, Md. Shawkat Lsiam;Akhter, Sayma;Kolke, Masao
    • Journal of Forest and Environmental Science
    • /
    • 제25권3호
    • /
    • pp.167-175
    • /
    • 2009
  • An exploratory study was conducted in Juri Forest Range-2, a proposed biodiversity conservation area of Bangladesh to explore the present growing stock of tree, regeneration condition and status of non-timber forest products (NTFPs). This conservation area contains both natural and artificial plantation was selected by using multistage random sampling method. For determination of plot size and sampling methods, the quadrate size ($10m{\times}10m$) for tree stock measurement, ($2m{\times}2m$) for regeneration survey, ($20m{\times}20m$) for NTFPs survey was determined. Regarding tree stock survey, 14 species under eight families were found where Tectona grandis shows average number of stem/ha was 624 and basal area/ha was (10.36 $m^2/ha$) followed by Acacia auriculiformis (0.2 $m^2/ha$ and 637 stem/ha), Gmelina arborea (0.2 $m^2/ha$ and 600 stem/ha). In regeneration survey, 14 species were found belonging to 9 families where Alstonia scholaris shows highest (3,750) seedling per hectare. Regarding NTFPs, bamboo and cane are the most common resources. In last ten years, the total timber output was 1,28,596.14 cubic feet and total amount of revenue was 4,64,434 US$. The vacant area is 1,335.5 acre which contains 14% of total area. If this vacant area is planted with suitable species and take proper steps for appropriate management of this species it will be a good biologically diversified area.

  • PDF

Modelling Stem Diameter Variability in Pinus caribaea (Morelet) Plantations in South West Nigeria

  • Adesoye, Peter Oluremi
    • Journal of Forest and Environmental Science
    • /
    • 제32권3호
    • /
    • pp.280-290
    • /
    • 2016
  • Stem diameter variability is an essential inventory result that provides useful information in forest management decisions. Little has been done to explore the modelling potentials of standard deviation (SDD) and coefficient of variation (CVD) of diameter at breast height (dbh). This study, therefore, was aimed at developing and testing models for predicting SDD and CVD in stands of Pinus caribaea Morelet (pine) in south west Nigeria. Sixty temporary sample plots of size $20m{\times}20m$, ranging between 15 and 37 years were sampled, covering the entire range of pine in south west Nigeria. The dbh (cm), total and merchantable heights (m), number of stems and age of trees were measured within each plot. Basal area ($m^2$), site index (m), relative spacing and percentile positions of dbh at $24^{th}$, $63^{rd}$, $76^{th}$ and $93^{rd}$ (i.e. $P_{24}$, $P_{63}$, $P_{76}$ and $P_{93}$) were computed from measured variables for each plot. Linear mixed model (LMM) was used to test the effects of locations (fixed) and plots (random). Six candidate models (3 for SDD and 3 for CVD), using three categories of explanatory variables (i.e. (i) only stand size measures, (ii) distribution measures, and (iii) combination of i and ii). The best model was chosen based on smaller relative standard error (RSE), prediction residual sum of squares (PRESS), corrected Akaike Information Criterion ($AIC_c$) and larger coefficient of determination ($R^2$). The results of the LMM indicated that location and plot effects were not significant. The CVD and SDD models having only measures of percentiles (i.e. $P_{24}$ and $P_{93}$) as predictors produced better predictions than others. However, CVD model produced the overall best predictions, because of the lower RSE and stability in measuring variability across different stand developments. The results demonstrate the potentials of CVD in modelling stem diameter variability in relationship with percentiles variables.

Homestead Plant Species Diversity and Its Contribution to the Household Economy: a Case Study from Northern Part of Bangladesh

  • Kibria, Mohammad Golam;Anik, Sawon Istiak
    • Journal of Forest and Environmental Science
    • /
    • 제26권1호
    • /
    • pp.9-15
    • /
    • 2010
  • This paper analyzes data on the plant species diversity and their contribution to the livelihoods of rural people in five villages of Domar upazila, Nilphamari district, Bangladesh. Assessment was done by means of multistage random sampling. Information collected from a total of 40 households ranging from small, medium and large categories. A total of 52 plant species belonging to 34 families were identified as being important to local livelihoods. Fruits (37%), timber (23%) and medicinal (17%) species were the most important plant use categories. Determination of the relative density of the different species revealed that Areca catechu constitutes 19.17% of homestead vegetation of the area followed by Artocarpus heterophyllus, which occupies 10.34%. Margalef index showed that there is no major difference (5.11 for large, 5.49 for medium, 4.73 for small) across the different size classes and Shannon-Weiner Index of the study area varies from 2.75 to 2.98. Results show that the average annual homestead income varied from US$108.69 to US$291.67 and contribute 6.63% of the household income.

머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석 (Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning)

  • 이형탁;이상원;조장원;조익순
    • 해양환경안전학회지
    • /
    • 제26권2호
    • /
    • pp.139-148
    • /
    • 2020
  • 선박이 접안할 때 발생하는 접안에너지에 가장 영향력이 큰 요소는 접안속도이며, 과도한 경우 사고로 이어질 수 있다. 접안속도의 결정에 영향을 미치는 요소는 다양하지만 기존 연구에서는 일반적으로 선박 크기에 제한하여 분석하였다. 따라서 본 연구에서는 다양한 선박 접안속도의 영향요소를 반영하여 분석하고 그에 따른 중요도를 도출하고자 한다. 분석에 활용한 데이터는 국내 한 탱커부두의 선박 접안속도를 실측한 것을 바탕으로 하였다. 수집된 데이터를 활용하여 머신러닝 분류 알고리즘인 의사결정나무(Decision Tree), 랜덤포레스트(Random Forest), 로지스틱회귀(Logistic Regression), 퍼셉트론(Perceptron)을 비교분석하였다. 알고리즘 평가 방법으로는 혼동 행렬에 따른 모델성능 평가지표를 사용하였다. 분석 결과, 가장 성능이 좋은 알고리즘으로는 퍼셉트론이 채택되었으며 그에 따른 접안속도 영향요인의 중요도는 선박 크기(DWT), 부두 위치(Jetty No.), 재화상태(State) 순으로 나타났다. 이에 따라 선박 접안 시, 선박의 크기를 비롯하여 부두 위치, 재화 상태 등 다양한 요인을 고려하여 접안속도를 설계하여야 한다.