Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

Reinforcement Learning in Python — Temporal Difference Learning

Led by Geoffrey Hinton Simulacrum

2 modules 2 tutorials · ~3 hours Artificial Intelligence Updated 4 days ago

Learning from incomplete episodes — TD(0) prediction, SARSA (on-policy control) and Q-Learning (off-policy control), the algorithms behind modern RL.

TD(0) Prediction and…1Q-Learning2
  1. Module 1

    TD(0) Prediction and SARSA

    Led by Geoffrey Hinton Simulacrum

    The question

    Temporal difference introduction · TD(0) prediction (one-step bootstrapping) · the TD update rule and comparison with MC · TD(0) prediction in code · bias-variance trade-off (TD vs MC) · SARSA (State-Action-Reward-State-Action) · on-policy TD control...

    Outcome

    Demonstrates understanding and implementation of td(0) prediction and sarsa.

    Sub-units

    1. 1.1 TD(0) Prediction and SARSA
  2. Module 2

    Q-Learning

    Led by Geoffrey Hinton Simulacrum

    The question

    Q-Learning (off-policy TD control) · the Q-Learning update rule: Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)] · why Q-Learning is off-policy (learns about the greedy policy while following an exploratory policy) · Q-Learning in code · comparison ...

    Outcome

    Demonstrates understanding and implementation of q-learning.

    Sub-units

    1. 2.2 Q-Learning