Claude Sonnet 4.6 Introduces Adaptive Thinking, Replacing Extended Mode

From Binary to Spectrum: Reasoning Gets Flexible

Claude Sonnet 4.6 introduces Adaptive Thinking, replacing the previous binary "extended thinking" mode with granular control over reasoning depth.

The Problem with Binary Thinking

Previous Claude models had two modes:

Standard: Fast and cheap, but shallow reasoning

Extended Thinking: Slow and expensive, but thorough

This forced developers to choose between speed and quality with no middle ground. A simple query and a complex architecture question cost the same when extended thinking was enabled.

The Effort Parameter

Adaptive Thinking introduces the `effort` parameter with four levels:

Level

Thinking Tokens

Latency

Cost

Use Case

low

~100-500

~1s

Simple Q&A, formatting

medium

~500-2000

~2-3s

1.5-2x

Standard coding tasks

high

~2000-10000

~5-15s

3-5x

Complex problems

auto

Varies

1-5x

General applications

Implementation

python
# Simple task - minimal thinking
response = client.messages.create(
    model="claude-sonnet-4-6-20260217",
    thinking={"type": "enabled", "effort": "low"},
    messages=[...]
)

# Complex task - deep reasoning
response = client.messages.create(
    model="claude-sonnet-4-6-20260217",
    thinking={"type": "enabled", "effort": "high"},
    messages=[...]
)

# Let the model decide
response = client.messages.create(
    model="claude-sonnet-4-6-20260217",
    thinking={"type": "enabled", "effort": "auto"},
    messages=[...]
)


Cost Savings

Early adopters report significant savings:

"We were running everything at max thinking. Now simple queries use 'low' effort—our costs dropped 40% with no quality impact on routine tasks." — Lead Developer, FinTech startup

Auto Mode Performance

When set to `auto`, Claude dynamically assesses query complexity and allocates reasoning resources accordingly. Testing shows:

70% of queries classified as low/medium effort
Auto typically matches human-selected effort levels
Edge cases occasionally under-estimate complexity
Accessing Thinking Content

Developers can inspect the reasoning process:

python
for block in response.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")


Best Practice

Start with `auto` for general applications, then optimize with explicit effort levels as you understand your workload patterns.

Migration

The `budget_tokens` parameter still works but is deprecated. New code should use `effort` instead.

From Binary to Spectrum: Reasoning Gets Flexible

The Problem with Binary Thinking

The Effort Parameter

Implementation

Cost Savings

Auto Mode Performance

Accessing Thinking Content

Best Practice

Migration

Ready to Experience Claude 5?

`Ready to Experience Claude 5?`