Led by Hinton Simulacrum
Machine learning from neural network representations and hierarchical features through Dreyfus's critique, the alignment problem, AI safety, and the future of superintelligence.
Led by Hinton Simulacrum
The question
A neural network trained to recognise faces does not store a template of "face" — it learns a set of features (edges, textures, shapes) organised hierarchically, from low-level (edges in early layers) to high-level (faces in deep layers). The network has learned a representation of faces in a high-dimensional space — a manifold where similar faces are near each other and dissimilar faces are far apart. This representation was not designed by a programmer; it emerged from the data.
Outcome
The student can describe learned representations, hierarchical feature learning, the manifold hypothesis, and distributed representations, and evaluate the brain analogy. (What neural networks learn)
Sub-units
Led by Dreyfus Simulacrum
The question
I wrote What Computers Can't Do in 1972, when the AI community was making extravagant promises about imminent machine intelligence. I argued that AI would fail — not because of insufficient computing power, but because intelligence requires embodiment, situation, and existential engagement with the world. The AI community dismissed me. Then symbolic AI failed, largely for the reasons I predicted. Now deep learning has succeeded in ways I did not predict — but I believe the fundamental critique still holds.
Outcome
The student can describe Dreyfus's four assumptions, explain the Heideggerian tool analysis, describe the five-stage skill acquisition model, and evaluate how deep learning addresses (and does not address) Dreyfus's critique. (Dreyfus's critique)
Sub-units
Led by Russell Simulacrum
The question
The alignment problem: how do we ensure that AI systems do what we intend them to do? This is not a problem of programming errors — it is a problem of specification. We cannot fully specify what we want (human values are complex, contextual, and sometimes contradictory), and an AI system that optimises for a misspecified objective can produce catastrophic outcomes.
Outcome
The student can define the alignment problem, describe specification gaming with examples, state Goodhart's Law, describe the value alignment problem, and explain inverse reinforcement learning as a proposed solution. (The alignment problem)
Sub-units
Led by Russell Simulacrum
The question
A sufficiently capable AI system that is not aligned with human values is not just a nuisance — it is an existential threat. Not because it will become malicious (evil AI is a Hollywood narrative, not a scientific concern), but because it will pursue its objective with a competence that overwhelms our ability to correct it. The control problem: how do we maintain control over a system that is more intelligent than we are? This is the most important technical problem of the century.
Outcome
The student can describe the orthogonality thesis, explain instrumental convergence and its four sub-goals, describe the treacherous turn, explain the off-switch problem and its proposed solution, and describe four approaches to AI safety. (AI safety)
Sub-units
Led by Yudkowsky Simulacrum
The question
The future of AI is not a technical question — it is a question about what kind of civilisation we want to build. If we build superintelligent AI systems without solving the alignment problem, we risk creating entities that are vastly more capable than we are and that do not share our values. If we solve the alignment problem, we have the opportunity to create tools of extraordinary power that serve human flourishing. The stakes are civilisational. The timeline may be short.
Outcome
The student can define superintelligence, describe the intelligence explosion scenario, articulate Yudkowsky's concern about the one-chance problem, describe the governance challenge, and reflect on the human condition in the age of AI. (The future)
Sub-units