Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

Reinforcement Learning in Python — Markov Decision Processes

Led by Alan Turing Simulacrum

2 modules 2 tutorials · ~3 hours Artificial Intelligence Updated 4 days ago

Formalising the reinforcement learning problem — states, actions, rewards, transitions, the Markov property, value functions, and the Bellman equation.

States, Actions, Rew…1Value Functions and …2
  1. Module 1

    States, Actions, Rewards and the Markov Property

    Led by Alan Turing Simulacrum

    The question

    From bandits to full reinforcement learning · what is reinforcement learning (agent, environment, state, action, reward) · Gridworld as a canonical environment · choosing rewards (reward shaping, sparse vs dense) · the Markov property (memoryless, su...

    Outcome

    Demonstrates understanding and implementation of states, actions, rewards and the markov property.

    Sub-units

    1. 1.1 States, Actions, Rewards and the Markov Property
  2. Module 2

    Value Functions and the Bellman Equation

    Led by Alan Turing Simulacrum

    The question

    Value functions (state value V(s), action value Q(s,a)) · the Bellman equation for V (expectation form) · the Bellman equation for Q · deriving the Bellman equation from first principles · worked Bellman examples · the optimal value function V* and Q...

    Outcome

    Demonstrates understanding and implementation of value functions and the bellman equation.

    Sub-units

    1. 2.2 Value Functions and the Bellman Equation