BenchmarkFebruary 17, 2026

Claude Sonnet 4.6 Hits 79.6% on SWE-bench, Within 1.2% of Opus 4.6

New Sonnet model closes gap with flagship on coding benchmarks, achieving industry-leading performance at mid-tier pricing.

Sonnet Reaches Flagship Territory

Claude Sonnet 4.6's 79.6% score on SWE-bench Verified puts it within striking distance of Opus 4.6's 80.8%—a gap of just 1.2 percentage points.

Historical Context

The rapid improvement in Sonnet-class models:

ModelSWE-bench VerifiedDate
Sonnet 3.549.0%Jun 2024
Sonnet 472.7%Mar 2025
Sonnet 4.577.2%Sep 2025
Sonnet 4.679.6%Feb 2026

In 20 months, Sonnet's SWE-bench performance has increased 30+ percentage points.

Benchmark Details

SWE-bench Verified tests AI models on real GitHub issues:
  • 500 curated problems from Python repositories
  • Must generate correct patches that pass tests
  • No training on test data
Sonnet 4.6 breakdown:
  • 79.6% standard pass rate
  • Higher with extended thinking / Adaptive Thinking (high effort)

Competitive Landscape

ModelSWE-bench VerifiedPrice (Input/Output)
Opus 4.680.8%$15/$75
Sonnet 4.679.6%$3/$15
GPT-5.2~76%$1.75/$14
Codex 5.356.8%*$10/$30

*Codex uses different benchmark variant (SWE-Bench Pro)

What the Gap Means

For most development tasks, 79.6% vs 80.8% is statistically insignificant:

  • Both solve ~4 of 5 real-world bugs correctly
  • Variance in individual runs exceeds the gap
  • Cost difference (5x) far exceeds capability difference (1.2%)

Developer Perspectives

"I've been A/B testing Sonnet vs Opus for a week. Can't tell the difference on my codebase. But I sure can tell the difference in my bill." — Senior Engineer, YC startup

"For 99% of tickets, Sonnet 4.6 is Opus. That last 1% is when I escalate." — Tech Lead, Series B company

When Opus 4.6 Still Wins

Despite near-parity, Opus 4.6 pulls ahead on:

  • Novel algorithm design
  • Multi-step refactoring with many dependencies
  • PhD-level scientific code
  • Maximum accuracy requirements (regulatory, financial)

The Value Proposition

At current pricing:

  • 100 SWE-bench problems cost ~$7 with Sonnet 4.6
  • Same problems cost ~$35 with Opus 4.6
  • 5x cost for 1.5% improvement

Conclusion

Sonnet 4.6 has effectively commoditized flagship-level coding performance. For most teams, the rational choice is Sonnet by default, Opus by exception.

Ready to Experience Claude 5?

Try Now