Search | Korea Science

Park, Cheolyong
- Journal of the Korean Data and Information Science Society
- /
- v.28 no.3
- /
- pp.515-524
- /
- 2017
In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.
https://doi.org/10.7465/jkdi.2017.28.3.515 인용 PDF KSCI

Park, Cheolyong
- Journal of the Korean Data and Information Science Society
- /
- v.27 no.4
- /
- pp.855-863
- /
- 2016
In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.
https://doi.org/10.7465/jkdi.2016.27.4.855 인용 PDF KSCI