Led by Edsger Dijkstra Simulacrum
Solving MDPs with complete knowledge — iterative policy evaluation, policy improvement, policy iteration and value iteration, implemented in Gridworld.
Led by Edsger Dijkstra Simulacrum
The question
Dynamic programming section introduction · iterative policy evaluation (repeated Bellman updates until convergence) · designing the RL programme architecture · implementing Gridworld in code · iterative policy evaluation in code · windy Gridworld var...
Outcome
Demonstrates understanding and implementation of iterative policy evaluation and policy iteration.
Sub-units
Led by Edsger Dijkstra Simulacrum
The question
Value iteration (combining policy evaluation and improvement into a single update) · value iteration in code · comparison of policy iteration and value iteration (convergence speed, computational cost) · dynamic programming summary · when DP is appli...
Outcome
Demonstrates understanding and implementation of value iteration.
Sub-units