DOI QR코드

DOI QR Code

A simulation study for various propensity score weighting methods in clinical problematic situations

임상에서 발생할 수 있는 문제 상황에서의 성향 점수 가중치 방법에 대한 비교 모의실험 연구

  • Siseong Jeong (Department of Biomedicine & Health Sciences, The Catholic University of Korea) ;
  • Eun Jeong Min (Department of Biomedicine & Health Sciences, The Catholic University of Korea)
  • 정시성 (가톨릭대학교 의생명.건강과학과) ;
  • 민은정 (가톨릭대학교 의생명.건강과학과)
  • Received : 2023.03.12
  • Accepted : 2023.05.08
  • Published : 2023.10.31

Abstract

The most representative design used in clinical trials is randomization, which is used to accurately estimate the treatment effect. However, comparison between the treatment group and the control group in an observational study without randomization is biased due to various unadjusted differences, such as characteristics between patients. Propensity score weighting is a widely used method to address these problems and to minimize bias by adjusting those confounding and assess treatment effects. Inverse probability weighting, the most popular method, assigns weights that are proportional to the inverse of the conditional probability of receiving a specific treatment assignment, given observed covariates. However, this method is often suffered by extreme propensity scores, resulting in biased estimates and excessive variance. Several alternative methods including trimming, overlap weights, and matching weights have been proposed to mitigate these issues. In this paper, we conduct a simulation study to compare performance of various propensity score weighting methods under diverse situation, such as limited overlap, misspecified propensity score, and treatment contrary to prediction. From the simulation results overlap weights and matching weights consistently outperform inverse probability weighting and trimming in terms of bias, root mean squared error and coverage probability.

대부분의 임상시험에서 가장 대표적으로 사용되는 실험설계는 무작위화로, 치료 효과를 정확하게 추정하기 위해 이용된다. 그러나 무작위화가 이루어지지 않은 관찰연구의 경우 치료군과 대조군의 비교로 얻는 치료효과에는 환자 간의 특성 등 여러 조정되지 않은 차이로 인해 편향이 발생한다. 성향 점수 가중치는 이러한 문제점을 해결하기 위해 널리쓰이는 방법으로 치료 효과를 추정하는데에 있어 교란요인을 조정하여 편향을 최소화하도록 하는 방법이다. 성향 점수를 이용한 가중치 방법 중 가장 널리 알려진 역확률 가중치는 관찰된 공변량이 주어졌을 때 특정 치료에 할당될 조건부 확률의 역에 비례하는 가중치를 할당한다. 그러나 이 방법은 극단적인 성향 점수에 의해 종종 방해 받아 편향된 추정치와 과도한 분산을 초래한다는 점이 알려져있어 이러한 문제를 완화하기 위해 절사 역확률 가중치, 중복 가중치, 일치 가중치를 포함한 여러 가지 대안 방법이 제안되었다. 본 논문에서는 제한된 중복, 잘못 지정된 성향 점수 모델 및 예측과 반대되는 치료 등 다양한 문제상황에서 여러 성향 점수 가중치 방법의 성능을 비교하는 시뮬레이션 비교연구를 수행하였다. 비교연구의 결과 중복 가중치와 일치 가중치는 편향, 제곱근평균제곱오차 및 포함 확률 측면에서 역확률 가중치와 절사역확률 가중치에 비에 우월한 성능을 보임을 확인하였다.

Keywords

Acknowledgement

이 논문은 2021년도 정부의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임 (NRF-2021R1F1A1058613).

References

  1. Arisido MW, Mecatti F, and Rebora P (2022). Improving the causal treatment effect estimation with propensity scores by the bootstrap, AStA Advances in Statistical Analysis, 106, 455-471. https://doi.org/10.1007/s10182-021-00427-3
  2. Austin PC (2008). A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003, Statistics in Medicine, 27, 2037-2049. https://doi.org/10.1002/sim.3150
  3. Austin PC (2022). Bootstrap vs asymptotic variance estimation when using propensity score weighting with continuous and binary outcomes, Statistics in Medicine, 41, 4426-4443. https://doi.org/10.1002/sim.9519
  4. Austin PC and Stuart EA (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies, Statistics in Medicine, 34, 3661-3679. https://doi.org/10.1002/sim.6607
  5. Cochran WG and Rubin DB (1973). Controlling bias in observational studies: A review, Sankhy¯a: The Indian Journal of Statistics, Series A, 35, 417-446.
  6. Crump RK, Hotz VJ, Imbens GW, and Mitnik OA (2009). Dealing with limited overlap in estimation of average treatment effects, Biometrika, 96, 187-199. https://doi.org/10.1093/biomet/asn055
  7. Freedman DA and Berk RA (2008). Weighting regressions by propensity scores, Evaluation Review, 32, 392-409. https://doi.org/10.1177/0193841X08317586
  8. Glynn RJ, Lunt M, Rothman KJ, Poole C, Schneeweiss S, and Sturmer T (2019). Comparison of alternative approaches to trim subjects in the tails of the propensity score distribution, Pharmacoepidemiology and Drug Safety, 28, 1290-1298. https://doi.org/10.1002/pds.4846
  9. Godambe VP (1970). Foundations of survey-sampling, The American Statistician, 24, 33-38. https://doi.org/10.1080/00031305.1970.10477175
  10. Hirano K, Imbens GW, and Ridder G (2003). Efficient estimation of average treatment effects using the estimated propensity score, Econometrica, 71, 1161-1189. https://doi.org/10.1111/1468-0262.00442
  11. Joffe MM and Rosenbaum PR (1999). Invited commentary: Propensity scores, American Journal of Epidemiology, 150, 327-333. https://doi.org/10.1093/oxfordjournals.aje.a010011
  12. Kim B and Kim JH (2020). Estimating causal effect of multi-valued treatment from observational survival data, Communications for Statistical Applications and Methods, 27, 675-688. https://doi.org/10.29220/CSAM.2020.27.6.675
  13. Kim GS, Paik MC, and Kim H (2017). Causal inference with observational data under cluster-specific non-ignorable assignment mechanism, Computational Statistics & Data Analysis, 113, 88-99. https://doi.org/10.1016/j.csda.2016.10.002
  14. Lee BK, Lessler J, and Stuart EA (2011). Weight trimming and propensity score weighting, PloS One, 6, 1-6. https://doi.org/10.1371/journal.pone.0018174
  15. Li F, Morgan KL, and Zaslavsky AM (2018). Balancing covariates via propensity score weighting, Journal of the American Statistical Association, 113, 390-400. https://doi.org/10.1080/01621459.2016.1260466
  16. Li F, Thomas LE, and Li F (2019). Addressing extreme propensity scores via the overlap weights, American Journal of Epidemiology, 188, 250-257. https://doi.org/10.1093/aje/kwy201
  17. Li L and Greene T (2013). A weighting analogue to pair matching in propensity score analysis, The International Journal of Biostatistics, 9, 215-234. https://doi.org/10.1515/ijb-2012-0030
  18. Lunceford JK and Davidian M (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study, Statistics in Medicine, 23, 2937-2960. https://doi.org/10.1002/sim.1903
  19. Mao H and Li L (2020). Flexible regression approach to propensity score analysis and its relationship with matching and weighting, Statistics in Medicine, 39, 2017-2034. https://doi.org/10.1002/sim.8526
  20. Mao H, Li L, and Greene T (2019). Propensity score weighting analysis and treatment effect discovery, Statistical Methods in Medical Research, 28, 2439-2454. https://doi.org/10.1177/0962280218781171
  21. McDonald RJ, McDonald JS, Kallmes DF, and Carter RE (2013). Behind the numbers: Propensity score analysis-a primer for the diagnostic radiologist, Radiology, 269, 640-645. https://doi.org/10.1148/radiol.13131465
  22. Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period-appli cation to control of the healthy worker survivor effect, Mathematical Modelling, 7, 1393-1512. https://doi.org/10.1016/0270-0255(86)90088-6
  23. Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41
  24. Rosenbaum PR and Rubin DB (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score, The American Statistician, 39, 33-38. https://doi.org/10.1080/00031305.1985.10479383
  25. Rubin DB (1973). Matching to remove bias in observational studies, Biometrics, 29, 159-183. https://doi.org/10.2307/2529684
  26. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies, Journal of Educational Psychology, 66, 688-701. https://doi.org/10.1037/h0037350
  27. Rubin DB (1980). Randomization analysis of experimental data: The fisher randomization test comment, Journal of the American Statistical Association, 75, 591-593. https://doi.org/10.1080/01621459.1980.10477517
  28. Stefanski LA and Boos DD (2002). The calculus of m-estimation, The American Statistician, 56, 29-38. https://doi.org/10.1198/000313002753631330
  29. Stuart EA (2010). Matching methods for causal inference: A review and a look forward, Statistical Science: A Review Journal of the Institute of Mathematical Statistics, 25, 1-21. https://doi.org/10.1214/09-STS313
  30. Sturmer T, Rothman KJ, Avorn J, and Glynn RJ (2010). Treatment effects in the presence of unmeasured confounding: Dealing with observations in the tails of the propensity score distribution-a simulation study, American Journal of Epidemiology, 172, 843-854. https://doi.org/10.1093/aje/kwq198
  31. Sturmer T, Webster-Clark M, Lund JL, Wyss R, Ellis AR, Lunt M, Rothman KJ, and Glynn RJ (2021). Propensity score weighting and trimming strategies for reducing variance and bias of treatment effect estimates: A simulation study, American Journal of Epidemiology, 190, 1659-1670. https://doi.org/10.1093/aje/kwab041
  32. Traskin M and Small DS (2011). Defining the study population for an observational study to ensure sufficient overlap: A tree approach, Statistics in Biosciences, 3, 94-118. https://doi.org/10.1007/s12561-011-9036-3
  33. Zhang HT, McGrath LJ, Ellis AR, Wyss R, Lund JL, and Sturmer T (2019). Restriction of pharmacoepidemiologic cohorts to initiators of medications in unrelated preventive drug classes to reduce confounding by frailty in older adults, American Journal of Epidemiology, 188, 1371-1382. https://doi.org/10.1093/aje/kwz083
  34. Zhou Y, Matsouaka RA, and Thomas L (2020). Propensity score weighting under limited overlap and model misspecification, Statistical Methods in Medical Research, 29, 3721-3756. https://doi.org/10.1177/0962280220940334