DOI QR코드

DOI QR Code

Nonparametric logistic regression based on sparse triangulation over a compact domain

  • Seoyeon Kim (Department of Statistics, Sungshin Women's University) ;
  • Kwan-Young Bak (School of Mathematics, Statistics and Data Science, Sungshin Women's University)
  • Received : 2024.02.21
  • Accepted : 2024.04.26
  • Published : 2024.09.30

Abstract

Based on the investigation of logistic regression models utilizing sparse triangulation within a compact domain in ℝ2, this study addresses the limited research extending the triogram model to logistic regression. A primary challenge arises from the potential instability induced by a large number of vertices, hindering the effective modeling of complex relationships. To mitigate this challenge, we propose introducing sparsity to boundary vertices of the triangulation based on the Ramer-Douglas-Peucker algorithm and employing the K-means algorithm for adaptive vertex initialization. A second order coordinate-wise descent algorithm is adopted to implement the proposed method. Validation of the proposed algorithm's stability and performance assessment are conducted using synthetic and handwritten digit data (LeCun et al., 1989). Results demonstrate the advantages of our method over existing methodologies, particularly when dealing with non-rectangular data domains.

Keywords

Acknowledgement

This research was supported by National Research Foundation (NRF) of Korea, RS-2022-00165581.

References

  1. Bak K-Y, Jhong J-H, Lee JJ, Shin J-K, and Koo J-Y (2021). Penalized logspline density estimation using total variation penalty, Computational Statistics & Data Analysis, 153, 107060.
  2. De Boor C (1978). A practical guide to splines, 27, springer-verlag New York.
  3. Douglas DH and Peucker TK (1973). Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, Cartographica: The International Journal for Geographic Information and Geovisualization, 10, 112-122.
  4. Eddy WF (1977). A new convex hull algorithm for planar sets, ACM Transactions on Mathematical Software (TOMS), 3, 398-403.
  5. Ferraccioli F, Arnone E, Finos L, Ramsay JO, and Sangalli LM (2021). Nonparametric density estimation over complicated domains, Journal of the Royal Statistical Society Series B: Statistical Methodology, 83, 346-368.
  6. Geenens G (2021). Mellin-Meijer kernel density estimation on R+, Annals of the Institute of Statistical Mathematics, 73, 953-977.
  7. Hastie T, Tibshirani R, Friedman JH, and Friedman JH (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, Springer, Berlin.
  8. Jhong J-H, Bak K-Y, and Koo J-Y (2022). Penalized polygram regression, Journal of the Korean Statistical Society, 51, 1161-1192.
  9. Koenker R and Mizera I (2004). Penalized triograms: Total variation regularization for bivariate smoothing, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 145-163.
  10. Lai M-J and Schumaker LL (2007). Spline Functions on Triangulations, Cambridge University Press, Cambridge.
  11. Lai M-J and Wang Li (2013). Bivariate penalized splines for regression, Statistica Sinica, 23, 1399-1417.
  12. LeCun Y, Boser B, Denker J, Henderson D, Howard R, HubbardW, and Jackel L (1989). Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems, 2,
  13. MacQueen J (1967). Some methods for classification and analysis of multivariate observations, In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 281-297. Oakland, CA, USA.
  14. Mark Hansen CK and Sardy S (1998). Triogram models, Journal of the American Statistical Association, 93, 101-119.
  15. Muller H-G (1991). Smooth optimum kernel estimators near endpoints, Biometrika, 78, 521-530.
  16. Nelder JA and Wedderburn RW (1972). Generalized linear models, Journal of the Royal Statistical Society Series A: Statistics in Society, 135, 370-384.
  17. Ramer U (1972). An iterative procedure for the polygonal approximation of plane curves, Computer Graphics and Image Processing, 1, 244-256.
  18. Ramsay T (2002). Spline smoothing over difficult regions, Journal of the Royal Statistical Society Series B: Statistical Methodology, 64, 307-319.
  19. Sangalli LM, Ramsay JO, and Ramsay TO (2013). Spatial spline regression models, Journal of the Royal Statistical Society Series B: Statistical Methodology, 75, 681-703.
  20. Scott-Hayward LAS, MacKenzie ML, Donovan CR, Walker C, and Ashe E (2014). Complex region spatial smoother (cress), Journal of Computational and Graphical Statistics, 23, 340-360.
  21. Stone CJ (1994). The use of polynomial splines and their tensor products in multivariate function estimation, The Annals of Statistics, 22, 118-171.
  22. Toussaint GT and Avis D (1982). On a convex hull algorithm for polygons and its application to triangulation problems, Pattern Recognition, 15, 23-29.
  23. Tsybakov AB (2008). Introduction to Nonparametric Estimation, 1st edition, Springer Publishing Company, Incorporated, Berlin.
  24. Wang H and Ranalli MG (2007). Low-rank smoothing splines on complicated domains, Biometrics, 63, 209-217.
  25. Wang R, Ramos D, and Fierrez J (2012). Improving radial triangulation-based forensic palmprint recognition according to point pattern comparison by relaxation. In 2012 5th IAPR International Conference on Biometrics (ICB), New Delhi, 427-432. IEEE.
  26. Wasserman L (2006). All of Nonparametric Statistics, Springer Science & Business Media, Berlin.
  27. Wood SN (2003). Thin plate regression splines, Journal of the Royal Statistical Society Series B: Statistical Methodology, 65, 95-114.
  28. Wright GA and Zabin SM (1994). Nonparametric density estimation for classes of positive random variables, IEEE Transactions on Information Theory, 40, 1513-1535.
  29. Zhu J and Hastie T (2005). Kernel logistic regression and the import vector machine, Journal of Computational and Graphical Statistics, 14, 185-205.