ResearchFebruary 17, 2026

Users Prefer Claude Sonnet 4.6 Over Opus 4.5 in Head-to-Head Tests

Anthropic reveals 59% of users preferred Sonnet 4.6 over the previous flagship Opus 4.5, citing better instruction following.

Mid-Tier Model Outperforms Previous Flagship

In what Anthropic calls a "generational leap," user testing shows Claude Sonnet 4.6 defeating the previous flagship Opus 4.5 in preference tests.

Testing Results

Sonnet 4.6 vs Sonnet 4.5: 70% preferred Sonnet 4.6 Sonnet 4.6 vs Opus 4.5: 59% preferred Sonnet 4.6

Why Users Prefer Sonnet 4.6

Qualitative feedback highlighted three factors:

1. Better Instruction Following

"Sonnet 4.6 actually does what I ask. Opus would often 'improve' my request in ways I didn't want."

2. Fewer Hallucinations

"Less confident in wrong answers. When Sonnet 4.6 doesn't know something, it says so rather than making things up."

3. Reduced Over-Engineering

"Asked for a simple function, got a simple function. Not a framework with dependency injection and abstract interfaces."

Benchmark Context

This preference data aligns with benchmarks:

MetricSonnet 4.6Opus 4.5
SWE-bench79.6%77.2%
OSWorld72.5%61.4%
GDPval-AA1633 Elo~1550

Pricing Implications

The preference data makes Sonnet 4.6 even more compelling:

  • Opus 4.5: $15/$75 per million tokens
  • Sonnet 4.6: $3/$15 per million tokens

Users get better perceived quality at 20% of the cost.

Enterprise Reaction

"We were planning an Opus 4.5 deployment for Q2. These results have us reconsidering. Why pay 5x for something users like less?" — CTO, Enterprise SaaS company

Opus 4.6 Still Has a Place

Anthropic notes Opus 4.6 (the new flagship) still excels for:

  • PhD-level scientific reasoning (91.3% GPQA vs 74.1%)
  • Multi-agent coordination
  • Extreme long-context retrieval (76% vs 18% on MRCR)

But for most applications, Sonnet 4.6 appears to be the optimal choice.

What This Means

The AI industry is seeing compression: mid-tier models reaching flagship performance while maintaining cost efficiency. Anthropic's strategy of rapid iteration appears to be paying off.

Ready to Experience Claude 5?

Try Now