Users Prefer Claude Sonnet 4.6 Over Opus 4.5 in Head-to-Head Tests
Anthropic reveals 59% of users preferred Sonnet 4.6 over the previous flagship Opus 4.5, citing better instruction following.
Mid-Tier Model Outperforms Previous Flagship
In what Anthropic calls a "generational leap," user testing shows Claude Sonnet 4.6 defeating the previous flagship Opus 4.5 in preference tests.
Testing Results
Sonnet 4.6 vs Sonnet 4.5: 70% preferred Sonnet 4.6 Sonnet 4.6 vs Opus 4.5: 59% preferred Sonnet 4.6Why Users Prefer Sonnet 4.6
Qualitative feedback highlighted three factors:
1. Better Instruction Following
"Sonnet 4.6 actually does what I ask. Opus would often 'improve' my request in ways I didn't want."
2. Fewer Hallucinations
"Less confident in wrong answers. When Sonnet 4.6 doesn't know something, it says so rather than making things up."
3. Reduced Over-Engineering
"Asked for a simple function, got a simple function. Not a framework with dependency injection and abstract interfaces."
Benchmark Context
This preference data aligns with benchmarks:
| Metric | Sonnet 4.6 | Opus 4.5 |
| SWE-bench | 79.6% | 77.2% |
| OSWorld | 72.5% | 61.4% |
| GDPval-AA | 1633 Elo | ~1550 |
Pricing Implications
The preference data makes Sonnet 4.6 even more compelling:
- Opus 4.5: $15/$75 per million tokens
- Sonnet 4.6: $3/$15 per million tokens
Users get better perceived quality at 20% of the cost.
Enterprise Reaction
"We were planning an Opus 4.5 deployment for Q2. These results have us reconsidering. Why pay 5x for something users like less?" — CTO, Enterprise SaaS company
Opus 4.6 Still Has a Place
Anthropic notes Opus 4.6 (the new flagship) still excels for:
- PhD-level scientific reasoning (91.3% GPQA vs 74.1%)
- Multi-agent coordination
- Extreme long-context retrieval (76% vs 18% on MRCR)
But for most applications, Sonnet 4.6 appears to be the optimal choice.
What This Means
The AI industry is seeing compression: mid-tier models reaching flagship performance while maintaining cost efficiency. Anthropic's strategy of rapid iteration appears to be paying off.