Article
· book: the scaling curve: dario amodei, anthropic, and the race to build and survive superintelligence
· technology
The Scaling Curve: Dario Amodei, Anthropic, and the Race to Build and Survive Superintelligence — Chapter Seven
- 1. Anthropic developed Constitutional AI to address the limitations of existing reinforcement learning from human feedback (RLHF) methods for aligning language models.
- 2. The core idea of Constitutional AI was to write a 'constitution' of principles in natural language that a language model could read and follow.
- 3. Constitutional AI works by having the model critique its own responses against a written constitution, often through AI feedback (RLAIF) rather than human feedback.
- 4. The critical feature of Constitutional AI is its human-readable constitution, allowing transparency and separating the 'what should it do?' from 'is it doing it?'.
- 5. Anthropic's Claude model is guided by the 'helpful, honest, and harmless' (triple-H) framework, acting as a high-level design specification.
- 6. The constitution evolved from prescriptive rules to high-level principles and reasons, enabling the model to derive appropriate behavior in novel situations.
- 7. Anthropic's ambitious goal for 2026 is to train Claude so it almost never violates the *spirit* of its constitution, not just the letter.
- 8. A profound political question remains: who gets to write the constitution for AI systems that will profoundly impact millions of users?
- 9. Constitutional AI proves that AI alignment is an engineering problem with practical solutions, even if they are probabilistic and iterative.