Claude Sonnet 4.6 Adaptive Thinking: Complete Developer Guide
Master Claude Sonnet 4.6's Adaptive Thinking engine: the effort parameter, dynamic reasoning, cost optimization, and implementation best practices.
TL;DR
Adaptive Thinking replaces Claude's binary "extended thinking" mode with dynamic, task-appropriate reasoning. Using the effort parameter (low/medium/high/auto), developers control how much the model reasons before responding—balancing speed, cost, and intelligence per request.
What is Adaptive Thinking?
Previous Claude models had two modes: standard (fast, cheap) or extended thinking (slow, expensive, thorough). Adaptive Thinking introduces a spectrum, allowing the model to automatically calibrate reasoning depth based on task complexity.
The key insight: not every question needs deep reasoning. "What's 2+2?" shouldn't cost the same as "Design a distributed system architecture."
The Effort Parameter
Control reasoning depth with the effort parameter:
| Value | Behavior | Use Case |
|---|
| low | Minimal reasoning, fastest response | Simple Q&A, formatting, basic tasks |
| medium | Balanced reasoning | Most coding tasks, analysis |
| high | Deep reasoning, slower response | Complex problems, architecture |
| auto | Model decides based on query | General-purpose applications |
Implementation
import anthropicclient = anthropic.Anthropic()
# Simple task - minimal thinking
simple_response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=1024,
thinking={"type": "enabled", "effort": "low"},
messages=[{"role": "user", "content": "Format this JSON"}]
)
# Complex task - deep reasoning
complex_response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=8192,
thinking={"type": "enabled", "effort": "high"},
messages=[{"role": "user", "content": "Design a microservices architecture for..."}]
)
# Let the model decide
auto_response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=4096,
thinking={"type": "enabled", "effort": "auto"},
messages=[{"role": "user", "content": user_query}]
)
Cost Implications
| Effort Level | Thinking Tokens | Relative Cost | Latency |
|---|
| low | ~100-500 | 1x | ~1s |
| medium | ~500-2000 | 1.5-2x | ~2-3s |
| high | ~2000-10000 | 3-5x | ~5-15s |
| auto | Varies | 1-5x | Varies |
Thinking tokens are billed at output rates ($15/M for Sonnet 4.6). A high-effort request generating 5,000 thinking tokens adds ~$0.075 to the cost.
Best Practices
1. Use Low Effort for Routine Tasks
# Good: Simple extraction doesn't need deep thinkingresponse = client.messages.create(
thinking={"type": "enabled", "effort": "low"},
messages=[{"role": "user", "content": "Extract the email from: John Smith
"}] )
2. Reserve High Effort for Complex Problems
# Good: Architecture decisions benefit from deep reasoningresponse = client.messages.create(
thinking={"type": "enabled", "effort": "high"},
messages=[{"role": "user", "content": "Review this code for security vulnerabilities and suggest fixes"}]
)
3. Use Auto for User-Facing Applications
When you can't predict query complexity, auto provides good default behavior:
# Good: Chatbot where queries vary in complexityresponse = client.messages.create(
thinking={"type": "enabled", "effort": "auto"},
messages=[{"role": "user", "content": user_input}]
)
4. Implement Effort Routing
def get_effort_level(task_type: str) -> str:simple_tasks = ["format", "extract", "summarize_short"]
complex_tasks = ["architecture", "security_review", "debug_complex"]
if task_type in simple_tasks:
return "low"
elif task_type in complex_tasks:
return "high"
else:
return "medium"
Accessing Thinking Content
Retrieve the model's reasoning process:
response = client.messages.create(thinking={"type": "enabled", "effort": "high"},
messages=[...]
)
for block in response.content:
if block.type == "thinking":
print(f"Reasoning: {block.thinking}")
elif block.type == "text":
print(f"Response: {block.text}")
Comparison: Old vs New
| Aspect | Extended Thinking (Old) | Adaptive Thinking (New) |
|---|
| Control | On/Off binary | Granular effort levels |
| Cost | Always expensive when on | Proportional to complexity |
| Latency | Always slow when on | Scales with effort |
| Optimization | Manual toggle | Auto mode available |
Conclusion
Adaptive Thinking transforms Claude from a binary tool into a nuanced reasoner. By matching effort to task complexity, you can reduce costs 50-80% on simple tasks while maintaining deep reasoning capability when needed. Start with auto for general applications, then optimize with explicit effort levels as you understand your workload patterns.