Claude Sonnet 4.6 Introduces Adaptive Thinking, Replacing Extended Mode
New Adaptive Thinking engine allows dynamic reasoning depth via effort parameter, optimizing cost and speed per request.
From Binary to Spectrum: Reasoning Gets Flexible
Claude Sonnet 4.6 introduces Adaptive Thinking, replacing the previous binary "extended thinking" mode with granular control over reasoning depth.
The Problem with Binary Thinking
Previous Claude models had two modes:
- Standard: Fast and cheap, but shallow reasoning
- Extended Thinking: Slow and expensive, but thorough
This forced developers to choose between speed and quality with no middle ground. A simple query and a complex architecture question cost the same when extended thinking was enabled.
The Effort Parameter
Adaptive Thinking introduces the `effort` parameter with four levels:
| Level | Thinking Tokens | Latency | Cost | Use Case |
| low | ~100-500 | ~1s | 1x | Simple Q&A, formatting |
| medium | ~500-2000 | ~2-3s | 1.5-2x | Standard coding tasks |
| high | ~2000-10000 | ~5-15s | 3-5x | Complex problems |
| auto | Varies | Varies | 1-5x | General applications |
Implementation
python
# Simple task - minimal thinking
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
thinking={"type": "enabled", "effort": "low"},
messages=[...]
)
# Complex task - deep reasoning
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
thinking={"type": "enabled", "effort": "high"},
messages=[...]
)
# Let the model decide
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
thinking={"type": "enabled", "effort": "auto"},
messages=[...]
)
Cost Savings
Early adopters report significant savings:
"We were running everything at max thinking. Now simple queries use 'low' effort—our costs dropped 40% with no quality impact on routine tasks." — Lead Developer, FinTech startup
Auto Mode Performance
When set to `auto`, Claude dynamically assesses query complexity and allocates reasoning resources accordingly. Testing shows:
- 70% of queries classified as low/medium effort
- Auto typically matches human-selected effort levels
- Edge cases occasionally under-estimate complexity
Accessing Thinking Content
Developers can inspect the reasoning process:
python
for block in response.content:
if block.type == "thinking":
print(f"Reasoning: {block.thinking}")
Best Practice
Start with `auto` for general applications, then optimize with explicit effort levels as you understand your workload patterns.
Migration
The `budget_tokens` parameter still works but is deprecated. New code should use `effort` instead.