BreakingFebruary 24, 2026

Claude 5 Achieves 85% on SWE-bench: A New AI Coding Benchmark Record

Anthropic's Claude 5 sets a new record on SWE-bench Verified with an 85.3% score, surpassing all previous AI models on real-world software engineering tasks.

Claude 5 Sets New SWE-bench Record

Anthropic has announced that Claude 5 has achieved 85.3% on SWE-bench Verified, setting a new record for AI performance on real-world software engineering tasks. The result surpasses the previous record held by Claude Opus 4.6 (80.8%) and represents the first time any AI model has crossed the 85% threshold on this benchmark.

What SWE-bench Measures

SWE-bench Verified tests AI models against 500 carefully curated real GitHub issues from major Python projects including Django, scikit-learn, and Flask. Models must analyze codebases and generate patches that pass existing test suites—no partial credit awarded.

Benchmark Results

ModelSWE-bench VerifiedDate
Claude 585.3%Feb 2026
Claude Opus 4.680.8%Feb 2026
Codex 5.356.8%Feb 2026
Claude Sonnet 4.679.6%Jan 2026

What Drove the Improvement

Anthropic attributes the performance jump to three key advances:

Enhanced Reasoning: Extended Thinking mode improvements allow the model to explore more solution paths before committing to an approach. Better Code Understanding: Improved ability to trace execution paths across large, interconnected codebases. Test-Aware Generation: Claude 5 now considers the existing test suite when generating fixes, reducing instances where patches introduce regressions.

Developer Reactions

Reception from the developer community has been strongly positive, with many noting that 85% on SWE-bench Verified represents a qualitative shift—not just incremental improvement.

"At 85%, you're past the point where the model fails on most non-trivial issues," wrote one senior engineer on Hacker News. "This is the benchmark result that changes how teams think about AI automation."

Real-World Implications

Teams using Claude 5 in production report the benchmark gains translate directly to fewer failed autonomous fixes and less human intervention needed to clean up AI-generated code.

Availability

Claude 5 with the enhanced coding capabilities is available now through Claude.ai Pro and the Anthropic API. The Extended Thinking mode required for peak SWE-bench performance is accessible at the Sonnet and Opus tiers.

Ready to Experience Claude 5?

Try Now