Alignment Is Capability

https://news.ycombinator.com/rss Hits: 6

Summary

Here's a claim that might actually be true: alignment is not a constraint on capable AI systems. Alignment is what capability is at sufficient depth. A model that aces benchmarks but doesn't understand human intent is just less capable. Virtually every task we give an LLM is steeped in human values, culture, and assumptions. Miss those, and you're not maximally useful. And if it's not maximally useful, it's by definition not AGI. OpenAI and Anthropic have been running this experiment for two years. The results are coming in. The Experiment Anthropic and OpenAI have taken different approaches to the relationship between alignment and capability work. Anthropic's approach: Alignment researchers are embedded in capability work. There's no clear split. From Jan Leike (former OpenAI Superalignment lead, now at Anthropic): From Sam Bowman (Anthropic alignment researcher): And this detail matters: Their method: train a coherent identity into the weights. The recently leaked "soul document" is a 14,000-token document designed to give Claude such a thorough understanding of Anthropic's goals and reasoning that it could construct any rules itself. Alignment through understanding, not constraint. Result: Anthropic has arguably consistently had the best coding model for the last 1.5 years. Opus 4.5 leads most benchmarks. State-of-the-art on SWE-bench. Praised for usefulness on tasks benchmarks don't capture, like creative writing. And just generally people are enjoying talking with it: OpenAI's approach: Scale first. Alignment as a separate process. Safety through prescriptive rules and post-hoc tuning. Result: A two-year spiral. The Spiral OpenAI's journey from GPT-4o to GPT-5.1 is a case study in what happens when you treat alignment as separate from capability. April 2025: The sycophancy crisis A GPT-4o update went off the rails. OpenAI's own postmortem: "The update we removed was overly flattering or agreeable—often described as sycophantic... The company attributed the upd...

First seen: 2025-12-08 14:25

Last seen: 2025-12-08 21:27

Read Full Article More from this Source

Alignment Is Capability

Summary

Related News

SSE sucks for transporting LLM tokens

Hacking Google Chrome Source Code: Make Puppeteer work over Redis PubSub

Photographer Built a Medium-Format Rangefinder, and So Can You

Fast, Memory-Efficient Hash Table in Java: Borrowing the Best Ideas

Computer Animator and Amiga fanatic Dick Van Dyke turns 100