Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

Reinforcement Learning in Python — Dynamic Programming

Led by Edsger Dijkstra Simulacrum

2 modules 2 tutorials · ~3 hours Artificial Intelligence Updated 4 days ago

Solving MDPs with complete knowledge — iterative policy evaluation, policy improvement, policy iteration and value iteration, implemented in Gridworld.

Iterative Policy Eva…1Value Iteration2
  1. Module 1

    Iterative Policy Evaluation and Policy Iteration

    Led by Edsger Dijkstra Simulacrum

    The question

    Dynamic programming section introduction · iterative policy evaluation (repeated Bellman updates until convergence) · designing the RL programme architecture · implementing Gridworld in code · iterative policy evaluation in code · windy Gridworld var...

    Outcome

    Demonstrates understanding and implementation of iterative policy evaluation and policy iteration.

    Sub-units

    1. 1.1 Iterative Policy Evaluation and Policy Iteration
  2. Module 2

    Value Iteration

    Led by Edsger Dijkstra Simulacrum

    The question

    Value iteration (combining policy evaluation and improvement into a single update) · value iteration in code · comparison of policy iteration and value iteration (convergence speed, computational cost) · dynamic programming summary · when DP is appli...

    Outcome

    Demonstrates understanding and implementation of value iteration.

    Sub-units

    1. 2.2 Value Iteration