Claude 5 Achieves 85% on SWE-bench: A New AI Coding Benchmark Record

Claude 5 Sets New SWE-bench Record

Anthropic has announced that Claude 5 has achieved 85.3% on SWE-bench Verified, setting a new record for AI performance on real-world software engineering tasks. The result surpasses the previous record held by Claude Opus 4.6 (80.8%) and represents the first time any AI model has crossed the 85% threshold on this benchmark.

What SWE-bench Measures

SWE-bench Verified tests AI models against 500 carefully curated real GitHub issues from major Python projects including Django, scikit-learn, and Flask. Models must analyze codebases and generate patches that pass existing test suites—no partial credit awarded.

Benchmark Results

Model

SWE-bench Verified

Date

Claude 5

85.3%

Feb 2026

Claude Opus 4.6

80.8%

Feb 2026

Codex 5.3

56.8%

Feb 2026

Claude Sonnet 4.6

79.6%

Jan 2026

What Drove the Improvement

Anthropic attributes the performance jump to three key advances:

Enhanced Reasoning: Extended Thinking mode improvements allow the model to explore more solution paths before committing to an approach. Better Code Understanding: Improved ability to trace execution paths across large, interconnected codebases. Test-Aware Generation: Claude 5 now considers the existing test suite when generating fixes, reducing instances where patches introduce regressions.

Developer Reactions

Reception from the developer community has been strongly positive, with many noting that 85% on SWE-bench Verified represents a qualitative shift—not just incremental improvement.

"At 85%, you're past the point where the model fails on most non-trivial issues," wrote one senior engineer on Hacker News. "This is the benchmark result that changes how teams think about AI automation."

Real-World Implications

Teams using Claude 5 in production report the benchmark gains translate directly to fewer failed autonomous fixes and less human intervention needed to clean up AI-generated code.

Availability

Claude 5 with the enhanced coding capabilities is available now through Claude.ai Pro and the Anthropic API. The Extended Thinking mode required for peak SWE-bench performance is accessible at the Sonnet and Opus tiers.

Claude 5 Sets New SWE-bench Record

What SWE-bench Measures

Benchmark Results

What Drove the Improvement

Developer Reactions

Real-World Implications

Availability

Ready to Experience Claude 5?