Comparison

Claude Fable 5 vs GPT-5.5 vs Gemini 3.1 Pro: The Benchmark Showdown

How Claude Fable 5 stacks up against GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro, FrontierCode, vision, and long-context performance.

June 10, 2026

TL;DR

Claude Fable 5, released June 9, 2026, leads the frontier model field by an unusual margin. On SWE-Bench Pro it scores 80.3 percent, against 69.2 percent for Claude Opus 4.8, 58.6 percent for GPT-5.5, and 54.2 percent for Gemini 3.1 Pro - about 11 points ahead of the next best model. Andrej Karpathy summarized the picture as "SOTA on everything by a margin".

The Headline Numbers

ModelSWE-Bench Pro
Claude Fable 580.3%
Claude Opus 4.869.2%
GPT-5.558.6%
Gemini 3.1 Pro54.2%

Two things stand out. First, the gap between Fable 5 and the nearest non-Anthropic competitor, GPT-5.5, is more than 21 points. Second, even Anthropic's own previous flagship, Opus 4.8, trails Fable 5 by roughly 11 points. Frontier benchmarks usually move in single-digit increments; this release does not.

Beyond SWE-Bench Pro, Fable 5 posted the highest score among frontier models on Cognition's FrontierCode eval, which measures code quality and efficiency rather than just task completion. Anthropic states the model is state-of-the-art on nearly all tested capability benchmarks, with the biggest gains on long and complex tasks.

Beyond the Leaderboards

Benchmarks compress a lot of nuance, so the qualitative evidence matters. In an early test, Stripe used Fable 5 to complete a migration across a 50-million-line Ruby codebase in one day - work estimated at over two months for a team. Cursor CEO Michael Truell reported: "Claude Fable 5 is the state of the art model on CursorBench. It's opened up a class of long-horizon problems that were out of reach."

On vision, Fable 5 is state-of-the-art at extracting precise numbers from scientific figures and at rebuilding web apps from screenshots. It also completed Pokemon FireRed using only vision, a task where earlier models needed helper tools. On long-context work, it maintains focus across millions of tokens, and in a file-based memory test playing Slay the Spire it performed 3x better than Opus 4.8.

The Price of the Lead

Fable 5 costs 10 dollars per million input tokens and 50 dollars per million output tokens - double Opus 4.8's rates. Whether the premium pays off depends on the task: for long-horizon agentic work, fewer failed runs and fewer correction turns can more than offset the higher per-token price. Equinox CTO Luke Anderson noted: "Claude Fable 5 delivers more capable engineering in fewer turns than prior models."

Bottom Line

If your evaluation criteria are SWE-Bench Pro, code quality, vision, or long-context endurance, Fable 5 currently leads GPT-5.5 and Gemini 3.1 Pro across the board, and by a margin large enough that it is not within normal benchmark noise. Competitors will respond, but as of June 2026, the frontier has one clear leader. It is available now on the Claude API, Amazon Bedrock, and GitHub Copilot, and free on paid Claude plans through June 22.

Sources

Ready to Experience Claude 5?

Try Now