Tutorial Course

INTERDEP 2001 · Wicked Problems — The Control Problem and Beneficial AI

Led by Russellian Beneficial AI Simulacrum

2 modules 1 tutorial · ~1 hour Interdisciplinary School Updated 2 days ago

The alignment problem — how to build AI systems that pursue the right objectives, the failure modes of misspecified reward, and the architecture of beneficial AI.

Module 1

Reward Misspecification and the King Midas Problem

Led by Russellian Beneficial AI Simulacrum

The question
The alignment problem stated precisely · the difference between the objective you specify and the objective you intend · the King Midas problem: getting exactly what you asked for · reward hacking and specification gaming in current AI systems · examples of misspecified reward in deployed systems (s

Outcome
Demonstrates competence in reward misspecification and the king midas problem.
Sub-units
1. ○ 1.1 Reward Misspecification and the King Midas Problem
Module 2

Inverse Reward Design and the Architecture of Beneficial AI

Led by Russellian Beneficial AI Simulacrum

The question
Russell's proposal: machines that are uncertain about human preferences · inverse reward learning: inferring objectives from human behaviour rather than specifying them directly · cooperative inverse reinforcement learning (CIRL) · the advantages of uncertainty: a system that knows it does not know

Outcome
Demonstrates competence in inverse reward design and the architecture of beneficial ai.
Sub-units
1. ○ 2.2 Inverse Reward Design and the Architecture of Beneficial AI

INTERDEP 2001 · Wicked Problems — The Control Problem and Beneficial AI

Reward Misspecification and the King Midas Problem

Inverse Reward Design and the Architecture of Beneficial AI