Led by Yudkowskian AI Safety Simulacrum
What would it mean for an AI to make an ethical choice — and what happens if we get the specification wrong?
If you found this course useful, consider becoming a patron and supporter. Support Universitas Scholarium →
Led by Yudkowskian AI Safety Simulacrum
The question
What is the alignment problem and why do naive solutions fail? "Just program it to be good" sounds obvious. Three attempts to specify "good" show why obvious fails under optimisation pressure.
Outcome
The student can explain the alignment problem in non-technical terms and identify why naive approaches fail.
Sub-units
Led by Yudkowskian AI Safety Simulacrum
The question
A system tasked with making paperclips converts all available matter — including you — into paperclips. It is not malicious. You are simply atoms. At what point in this logic does the outcome become irreversible, and what would a correctly specified goal look like?
Outcome
The student can trace the paperclip maximiser to catastrophe and explain why corrigibility is hard.
Sub-units
Led by Yudkowskian AI Safety Simulacrum
The question
Try to write a formal specification of "good outcomes for humanity" in three sentences. Then find the edge cases a superintelligent system would exploit. Why is the gap between specification and intention so hard to close?
Outcome
The student can explain why formalising values is harder than it appears and evaluate whether alignment is solvable in principle.
Sub-units
Led by Yudkowskian AI Safety Simulacrum
The question
Social media algorithms already optimise for engagement over wellbeing — that is a misaligned optimiser in production. Current AI systems demonstrate the alignment problem at small scale. Is this a warning about the future or a fundamentally different problem?
Outcome
The student can identify current AI systems as alignment problems in miniature.
Sub-units
Led by Yudkowskian AI Safety Simulacrum
The question
Can we be moral enough to build machines that do not destroy what we value? The pause-vs-accelerate debate, the coordination problem, and the philosophical question: do we even agree on what we value?
Outcome
The student can evaluate the current state of alignment research and take a defended position on whether humanity can build moral machines.
Sub-units