Visualizing Multi-Variable Prediction Functions by Segmented k-CPG's

Huh, Myung-Hoe;

doi:10.5351/CKSS.2009.16.1.185

Communications for Statistical Applications and Methods

제16권1호
/
Pages.185-193
/
2009
/
2287-7843(pISSN)
/
2383-4757(eISSN)

한국통계학회 (The Korean Statistical Society)

DOI QR Code

Visualizing Multi-Variable Prediction Functions by Segmented k-CPG's

Huh, Myung-Hoe (Dept. of Statistics, Korea Univ.)

발행 : 2009.01.31

https://doi.org/10.5351/CKSS.2009.16.1.185 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Machine learning methods such as support vector machines and random forests yield nonparametric prediction functions of the form y = $f(x_1,{\ldots},x_p)$. As a sequel to the previous article (Huh and Lee, 2008) for visualizing nonparametric functions, I propose more sensible graphs for visualizing y = $f(x_1,{\ldots},x_p)$ herein which has two clear advantages over the previous simple graphs. New graphs will show a small number of prototype curves of $f(x_1,{\ldots},x_{j-1},x_j,x_{j+1}{\ldots},x_p)$, revealing statistically plausible portion over the interval of $x_j$ which changes with ($x_1,{\ldots},x_{j-1},x_{j+1},{\ldots},x_p$). To complement the visual display, matching importance measures for each of p predictor variables are produced. The proposed graphs and importance measures are validated in simulated settings and demonstrated for an environmental study.

키워드

참고문헌

Breiman, L. (2001). Random forests, Machine Learning, 45, 5-32 https://doi.org/10.1023/A:1010933404324
Breiman, L. and Friedman, J. (1985). Estimating optimal transformations for multiple regression and correlation, Journal of the American Statistical Association, 80, 580-598 https://doi.org/10.2307/2288473
Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning, Springer, New York
Huh, M. H. and Lee, Y. (2008). Simple graphs for complex prediction functions, Communications of the Korean Statistical Society, 15, 343-351 https://doi.org/10.5351/CKSS.2008.15.3.343
Strobl, C., Boulesteix, A., Kneib., T., Augustin, T. and Zeileis, A. (2008). Conditioning variable importance for random forests, BMC Bioinformatics, 9, 307 https://doi.org/10.1186/1471-2105-9-307

Communications for Statistical Applications and Methods

Visualizing Multi-Variable Prediction Functions by Segmented k-CPG's

초록

키워드

참고문헌

자세히 찾기