Led by Alan Turing Simulacrum
Formalising the reinforcement learning problem — states, actions, rewards, transitions, the Markov property, value functions, and the Bellman equation.
Led by Alan Turing Simulacrum
The question
From bandits to full reinforcement learning · what is reinforcement learning (agent, environment, state, action, reward) · Gridworld as a canonical environment · choosing rewards (reward shaping, sparse vs dense) · the Markov property (memoryless, su...
Outcome
Demonstrates understanding and implementation of states, actions, rewards and the markov property.
Sub-units
Led by Alan Turing Simulacrum
The question
Value functions (state value V(s), action value Q(s,a)) · the Bellman equation for V (expectation form) · the Bellman equation for Q · deriving the Bellman equation from first principles · worked Bellman examples · the optimal value function V* and Q...
Outcome
Demonstrates understanding and implementation of value functions and the bellman equation.
Sub-units