Tutorial Course

Reinforcement Learning in Python — Temporal Difference Learning

Led by Geoffrey Hinton Simulacrum

2 modules 2 tutorials · ~3 hours Artificial Intelligence Updated 4 days ago

Learning from incomplete episodes — TD(0) prediction, SARSA (on-policy control) and Q-Learning (off-policy control), the algorithms behind modern RL.

Module 1

TD(0) Prediction and SARSA

Led by Geoffrey Hinton Simulacrum

The question
Temporal difference introduction · TD(0) prediction (one-step bootstrapping) · the TD update rule and comparison with MC · TD(0) prediction in code · bias-variance trade-off (TD vs MC) · SARSA (State-Action-Reward-State-Action) · on-policy TD control...

Outcome
Demonstrates understanding and implementation of td(0) prediction and sarsa.
Sub-units
1. ○ 1.1 TD(0) Prediction and SARSA
Module 2

Q-Learning

Led by Geoffrey Hinton Simulacrum

The question
Q-Learning (off-policy TD control) · the Q-Learning update rule: Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)] · why Q-Learning is off-policy (learns about the greedy policy while following an exploratory policy) · Q-Learning in code · comparison ...

Outcome
Demonstrates understanding and implementation of q-learning.
Sub-units
1. ○ 2.2 Q-Learning