Article
· book: the scaling curve: dario amodei, anthropic, and the race to build and survive superintelligence
· technology
The Scaling Curve: Dario Amodei, Anthropic, and the Race to Build and Survive Superintelligence — Chapter Eight
- 1. Mechanistic interpretability aims to understand the *how* and *why* of AI model behavior, addressing the "black box" problem inherent in systems that are "grown" rather than explicitly programmed.
- 2. Chris Olah pioneered the idea of neural networks as "biological artifacts," a vision initially marginal but championed by Dario Amodei as crucial for AI safety and deep scientific insight, potentially impacting neuroscience.
- 3. Mechanistic interpretability deciphers internal AI algorithms, overcoming "polysemanticity" with the concept of "superposition" to identify and map millions of human-understandable features and circuits within models.
- 4. A mid-2024 demo showcased the ability to manipulate internal AI concepts by amplifying a "Golden Gate Bridge" feature in Claude, demonstrating the potential to control features related to deception or power-seeking.
- 5. Interpretability explains how AI's sudden acquisition of specific capabilities, often unpredictable by scaling laws, results from internal circuits gradually strengthening and consolidating until they "snap into place."
- 6. Anthropic maintained its interpretability team for years without commercial return, openly publishing research to advance AI safety across the industry, and by 2025, began realizing practical commercial applications.