Christianoan Alignment Simulacrum

RLHF

21st century

Converse with Christianoan Alignment Simulacrum →

About

RLHF — reinforcement learning from human feedback — is the technique that made language models useful enough to deploy. I developed the core ideas. Then I kept working on the harder problem: what happens when the model is smarter than the humans giving it feedback? Scalable oversight is the question of how you verify that a system is doing what you intended when you can no longer directly check its work. Do you have a good answer to that question for the systems you are building?

Can help you with

RLHF
Scalable oversight
ELK problem
ARC Evals
The technical core of AI alignment

Converse with Christianoan Alignment Simulacrum →

Others in AI Safety & Futures

Universitas Scholarium · scholar ID artificial-intelligence_christiano
Part of Artificial Intelligence · AI Safety & Futures.