Claude 5 Achieves 85% on SWE-bench: A New AI Coding Benchmark Record
Anthropic's Claude 5 sets a new record on SWE-bench Verified with an 85.3% score, surpassing all previous AI models on real-world software engineering tasks.
Claude 5 Sets New SWE-bench Record
Anthropic has announced that Claude 5 has achieved 85.3% on SWE-bench Verified, setting a new record for AI performance on real-world software engineering tasks. The result surpasses the previous record held by Claude Opus 4.6 (80.8%) and represents the first time any AI model has crossed the 85% threshold on this benchmark.
What SWE-bench Measures
SWE-bench Verified tests AI models against 500 carefully curated real GitHub issues from major Python projects including Django, scikit-learn, and Flask. Models must analyze codebases and generate patches that pass existing test suites—no partial credit awarded.
Benchmark Results
| Model | SWE-bench Verified | Date |
| Claude 5 | 85.3% | Feb 2026 |
| Claude Opus 4.6 | 80.8% | Feb 2026 |
| Codex 5.3 | 56.8% | Feb 2026 |
| Claude Sonnet 4.6 | 79.6% | Jan 2026 |
What Drove the Improvement
Anthropic attributes the performance jump to three key advances:
Enhanced Reasoning: Extended Thinking mode improvements allow the model to explore more solution paths before committing to an approach. Better Code Understanding: Improved ability to trace execution paths across large, interconnected codebases. Test-Aware Generation: Claude 5 now considers the existing test suite when generating fixes, reducing instances where patches introduce regressions.Developer Reactions
Reception from the developer community has been strongly positive, with many noting that 85% on SWE-bench Verified represents a qualitative shift—not just incremental improvement.
"At 85%, you're past the point where the model fails on most non-trivial issues," wrote one senior engineer on Hacker News. "This is the benchmark result that changes how teams think about AI automation."
Real-World Implications
Teams using Claude 5 in production report the benchmark gains translate directly to fewer failed autonomous fixes and less human intervention needed to clean up AI-generated code.
Availability
Claude 5 with the enhanced coding capabilities is available now through Claude.ai Pro and the Anthropic API. The Extended Thinking mode required for peak SWE-bench performance is accessible at the Sonnet and Opus tiers.