Claude Sonnet 4.6 Hits 79.6% on SWE-bench, Within 1.2% of Opus 4.6
New Sonnet model closes gap with flagship on coding benchmarks, achieving industry-leading performance at mid-tier pricing.
Sonnet Reaches Flagship Territory
Claude Sonnet 4.6's 79.6% score on SWE-bench Verified puts it within striking distance of Opus 4.6's 80.8%—a gap of just 1.2 percentage points.
Historical Context
The rapid improvement in Sonnet-class models:
| Model | SWE-bench Verified | Date |
| Sonnet 3.5 | 49.0% | Jun 2024 |
| Sonnet 4 | 72.7% | Mar 2025 |
| Sonnet 4.5 | 77.2% | Sep 2025 |
| Sonnet 4.6 | 79.6% | Feb 2026 |
In 20 months, Sonnet's SWE-bench performance has increased 30+ percentage points.
Benchmark Details
SWE-bench Verified tests AI models on real GitHub issues:- 500 curated problems from Python repositories
- Must generate correct patches that pass tests
- No training on test data
- 79.6% standard pass rate
- Higher with extended thinking / Adaptive Thinking (high effort)
Competitive Landscape
| Model | SWE-bench Verified | Price (Input/Output) |
| Opus 4.6 | 80.8% | $15/$75 |
| Sonnet 4.6 | 79.6% | $3/$15 |
| GPT-5.2 | ~76% | $1.75/$14 |
| Codex 5.3 | 56.8%* | $10/$30 |
*Codex uses different benchmark variant (SWE-Bench Pro)
What the Gap Means
For most development tasks, 79.6% vs 80.8% is statistically insignificant:
- Both solve ~4 of 5 real-world bugs correctly
- Variance in individual runs exceeds the gap
- Cost difference (5x) far exceeds capability difference (1.2%)
Developer Perspectives
"I've been A/B testing Sonnet vs Opus for a week. Can't tell the difference on my codebase. But I sure can tell the difference in my bill." — Senior Engineer, YC startup
"For 99% of tickets, Sonnet 4.6 is Opus. That last 1% is when I escalate." — Tech Lead, Series B company
When Opus 4.6 Still Wins
Despite near-parity, Opus 4.6 pulls ahead on:
- Novel algorithm design
- Multi-step refactoring with many dependencies
- PhD-level scientific code
- Maximum accuracy requirements (regulatory, financial)
The Value Proposition
At current pricing:
- 100 SWE-bench problems cost ~$7 with Sonnet 4.6
- Same problems cost ~$35 with Opus 4.6
- 5x cost for 1.5% improvement
Conclusion
Sonnet 4.6 has effectively commoditized flagship-level coding performance. For most teams, the rational choice is Sonnet by default, Opus by exception.