• Title/Summary/Keyword: 다중 슬롯머신

Search Result 2, Processing Time 0.011 seconds

Thompson sampling for multi-armed bandits in big data environments (빅데이터 환경에서 다중 슬롯머신 문제에 대한 톰슨 샘플링 방법)

  • Min Kyong Kim;Beom Seuk Hwang
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.5
    • /
    • pp.663-673
    • /
    • 2024
  • The multi-armed bandits (MAB) problem, involves selecting actions to maximize rewards within dynamic environments. This study explores the application of Thompson sampling, a robust MAB algorithm, within the context of big data analytics and statistical learning theory. By leveraging large-scale banner click data from recommendation systems, we evaluate Thompson sampling's performance across various simulated scenarios, employing advanced approximation techniques. Our findings demonstrate that Thompson sampling, particularly with Langevin Monte Carlo approximation, maintains robust performance and scalability in big data environments. This underscores its practical significance and adaptability, aligning with contemporary challenges in statistical learning.

The UCT algorithm applied to find the best first move in the game of Tic-Tac-Toe (삼목 게임에서 최상의 첫 수를 구하기 위해 적용된 신뢰상한트리 알고리즘)

  • Lee, Byung-Doo;Park, Dong-Soo;Choi, Young-Wook
    • Journal of Korea Game Society
    • /
    • v.15 no.5
    • /
    • pp.109-118
    • /
    • 2015
  • The game of Go originated from ancient China is regarded as one of the most difficult challenges in the filed of AI. Over the past few years, the top computer Go programs based on MCTS have surprisingly beaten professional players with handicap. MCTS is an approach that simulates a random sequence of legal moves until the game is ended, and replaced the traditional knowledge-based approach. We applied the UCT algorithm which is a MCTS variant to the game of Tic-Tac-Toe for finding the best first move, and compared it with the result generated by a pure MCTS. Furthermore, we introduced and compared the performances of epsilon-Greedy algorithm and UCB algorithm for solving the Multi-Armed Bandit problem to understand the UCB.