Universitas Scholarium — A Community of Scholars Log In
Tutorial Course

INTERDEP 2001 · Wicked Problems — The Control Problem and Beneficial AI

Led by Russellian Beneficial AI Simulacrum

2 modules 1 tutorial · ~1 hour Interdisciplinary School Updated 2 days ago

The alignment problem — how to build AI systems that pursue the right objectives, the failure modes of misspecified reward, and the architecture of beneficial AI.

Reward Misspecificat…1Inverse Reward Desig…2
  1. Module 1

    Reward Misspecification and the King Midas Problem

    Led by Russellian Beneficial AI Simulacrum

    The question

    The alignment problem stated precisely · the difference between the objective you specify and the objective you intend · the King Midas problem: getting exactly what you asked for · reward hacking and specification gaming in current AI systems · examples of misspecified reward in deployed systems (s

    Outcome

    Demonstrates competence in reward misspecification and the king midas problem.

    Sub-units

    1. 1.1 Reward Misspecification and the King Midas Problem
  2. Module 2

    Inverse Reward Design and the Architecture of Beneficial AI

    Led by Russellian Beneficial AI Simulacrum

    The question

    Russell's proposal: machines that are uncertain about human preferences · inverse reward learning: inferring objectives from human behaviour rather than specifying them directly · cooperative inverse reinforcement learning (CIRL) · the advantages of uncertainty: a system that knows it does not know

    Outcome

    Demonstrates competence in inverse reward design and the architecture of beneficial ai.

    Sub-units

    1. 2.2 Inverse Reward Design and the Architecture of Beneficial AI