Claude Opus 4.6 Benchmark Deep Dive: SWE-bench, GPQA & More

TL;DR

Claude Opus 4.6 achieves 82.1% on SWE-bench Verified (industry-leading), 88.5% on GPQA Diamond, 97.8% on HumanEval, and 94.2% on MATH. These results establish it as the most capable coding AI available.

SWE-bench Verified: 82.1%

Industry-leading score for real-world GitHub issue resolution.

Competitive Comparison

Benchmark	Opus 4.6	GPT-5.2	Gemini 3

SWE-bench

82.1%

76.3%

78.4%

GPQA

88.5%

85.1%

82.7%

Conclusion

Claude Opus 4.6's benchmark performance validates its position as the leading coding AI.

TL;DR

SWE-bench Verified: 82.1%

Competitive Comparison

Conclusion

Ready to Experience Claude 5?