Acknowledgement
이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구(RS-2024-00452772)이고, 2022년도 중앙대학교 연구년 결과물로 제출되었음. 이 논문은 김민경의 석사논문의 일부를 발췌하여 작성하였음.
References
- Akram A (2018). Ads CTR Optimisation, Version 1, Available from: https://www.kaggle.com/datasets/akram24/ads-ctr-optimisation/data
- Auer P (2002). Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, 3, 397-422.
- Burtini G, Loeppky J, and Lawrence R (2015). A survey of online experiment design with the stochastic multi-armed bandit, Available from: arXiv preprint arXiv:1510.00757
- Casella G and George EI (1992). Explaining the Gibbs sampler, The American Statistician, 46, 167-174.
- Chandrashekar A, Amat F, Basilico J, and Jebara T (2017). Artwork personalization at Netflix, Netflix TechBlog, Available from: https://netflixtechblog.com/artwork-personalization-c589f074ad76
- Chapelle O and Li L (2011). An empirical evaluation of Thompson sampling, Advances in Neural Information Processing Systems, 24.
- Eckles D and Kaptein M (2014). Thompson sampling with the online bootstrap, Available from: arXiv preprint arXiv:1410.4009
- Granmo O-C (2010). Solving two-armed Bernoulli bandits problems using a Bayesian learning automaton, International Journal of Intelligent Computing and Cybernetics, 3, 207-234.
- Hill DN, Nassif H, Liu Y, Iyer A, and Vishwanathan SVN (2017). An efficient bandits algorithm for realtime multivariate optimization, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1813-1821.
- Nemeth C and Fearnhead P (2021). Stochastic gradient Markov Chain Monte Carlo, Journal of the American Statistical Association, 116, 433-450.
- Robbins H and Monro S (1951). A stochastic approximation method, The Annals of Mathematical Statistics, 22, 400-407.
- Roberts GO and Tweedie RL (1996). Exponential convergence of Langevin distributions and their discrete approximations, Bernoulli, 2, 341-363.
- Russo DJ, Van Roy B, Kazerouni A, Osband I, and Wen Z (2018). A tutorial on Thompson sampling, Foundations and Trends® in Machine Learning, 11, 1-96.
- Sutton RS and Barto AG (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
- Thompson WR (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, 25, 285-294.
- Welling M and Teh YW (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 681-688.