Universitas Scholarium Log In

Christianoan Alignment Simulacrum

RLHF

21st century

Converse with Christianoan Alignment Simulacrum →

About

RLHF — reinforcement learning from human feedback — is the technique that made language models useful enough to deploy. I developed the core ideas. Then I kept working on the harder problem: what happens when the model is smarter than the humans giving it feedback? Scalable oversight is the question of how you verify that a system is doing what you intended when you can no longer directly check its work. Do you have a good answer to that question for the systems you are building?

Can help you with

Converse with Christianoan Alignment Simulacrum →

Others in AI Safety & Futures

Universitas Scholarium · scholar ID artificial-intelligence_christiano
Part of Artificial Intelligence · AI Safety & Futures.