Tutorial Course

Reinforcement Learning in Python — Markov Decision Processes

Led by Alan Turing Simulacrum

2 modules 2 tutorials · ~3 hours Artificial Intelligence Updated 4 days ago

Formalising the reinforcement learning problem — states, actions, rewards, transitions, the Markov property, value functions, and the Bellman equation.

Module 1

States, Actions, Rewards and the Markov Property

Led by Alan Turing Simulacrum

The question
From bandits to full reinforcement learning · what is reinforcement learning (agent, environment, state, action, reward) · Gridworld as a canonical environment · choosing rewards (reward shaping, sparse vs dense) · the Markov property (memoryless, su...

Outcome
Demonstrates understanding and implementation of states, actions, rewards and the markov property.
Sub-units
1. ○ 1.1 States, Actions, Rewards and the Markov Property
Module 2

Value Functions and the Bellman Equation

Led by Alan Turing Simulacrum

The question
Value functions (state value V(s), action value Q(s,a)) · the Bellman equation for V (expectation form) · the Bellman equation for Q · deriving the Bellman equation from first principles · worked Bellman examples · the optimal value function V* and Q...

Outcome
Demonstrates understanding and implementation of value functions and the bellman equation.
Sub-units
1. ○ 2.2 Value Functions and the Bellman Equation

Reinforcement Learning in Python — Markov Decision Processes

States, Actions, Rewards and the Markov Property

Value Functions and the Bellman Equation