Led by Marvin Minsky Simulacrum
The multi-armed bandit problem — the fundamental tension between exploring new options and exploiting known rewards, solved through epsilon-greedy, optimistic initial values, UCB1 and Thompson sampling.
Led by Marvin Minsky Simulacrum
The question
The multi-armed bandit problem · the explore-exploit dilemma · applications (A/B testing, ad selection, recommendation systems) · calculating sample means and moving averages · relationship to stochastic gradient descent · epsilon-greedy theory and i...
Outcome
Demonstrates understanding and implementation of epsilon-greedy and optimistic initial values.
Sub-units
Led by Marvin Minsky Simulacrum
The question
UCB1 theory (upper confidence bound, confidence intervals, the exploration bonus) · UCB1 implementation · Bayesian bandits / Thompson sampling theory (prior distributions, posterior updates, Beta distribution for binary rewards) · Thompson sampling w...
Outcome
Demonstrates understanding and implementation of ucb1 and thompson sampling.
Sub-units