Tutorial

Claude Sonnet 4.6 Adaptive Thinking: Complete Developer Guide

Master Claude Sonnet 4.6's Adaptive Thinking engine: the effort parameter, dynamic reasoning, cost optimization, and implementation best practices.

February 2026

TL;DR

Adaptive Thinking replaces Claude's binary "extended thinking" mode with dynamic, task-appropriate reasoning. Using the effort parameter (low/medium/high/auto), developers control how much the model reasons before responding—balancing speed, cost, and intelligence per request.

What is Adaptive Thinking?

Previous Claude models had two modes: standard (fast, cheap) or extended thinking (slow, expensive, thorough). Adaptive Thinking introduces a spectrum, allowing the model to automatically calibrate reasoning depth based on task complexity.

The key insight: not every question needs deep reasoning. "What's 2+2?" shouldn't cost the same as "Design a distributed system architecture."

The Effort Parameter

Control reasoning depth with the effort parameter:

ValueBehaviorUse Case
lowMinimal reasoning, fastest responseSimple Q&A, formatting, basic tasks
mediumBalanced reasoningMost coding tasks, analysis
highDeep reasoning, slower responseComplex problems, architecture
autoModel decides based on queryGeneral-purpose applications

Implementation

import anthropic

client = anthropic.Anthropic()

# Simple task - minimal thinking

simple_response = client.messages.create(

model="claude-sonnet-4-6-20260217",

max_tokens=1024,

thinking={"type": "enabled", "effort": "low"},

messages=[{"role": "user", "content": "Format this JSON"}]

)

# Complex task - deep reasoning

complex_response = client.messages.create(

model="claude-sonnet-4-6-20260217",

max_tokens=8192,

thinking={"type": "enabled", "effort": "high"},

messages=[{"role": "user", "content": "Design a microservices architecture for..."}]

)

# Let the model decide

auto_response = client.messages.create(

model="claude-sonnet-4-6-20260217",

max_tokens=4096,

thinking={"type": "enabled", "effort": "auto"},

messages=[{"role": "user", "content": user_query}]

)

Cost Implications

Effort LevelThinking TokensRelative CostLatency
low~100-5001x~1s
medium~500-20001.5-2x~2-3s
high~2000-100003-5x~5-15s
autoVaries1-5xVaries

Thinking tokens are billed at output rates ($15/M for Sonnet 4.6). A high-effort request generating 5,000 thinking tokens adds ~$0.075 to the cost.

Best Practices

1. Use Low Effort for Routine Tasks

# Good: Simple extraction doesn't need deep thinking

response = client.messages.create(

thinking={"type": "enabled", "effort": "low"},

messages=[{"role": "user", "content": "Extract the email from: John Smith "}]

)

2. Reserve High Effort for Complex Problems

# Good: Architecture decisions benefit from deep reasoning

response = client.messages.create(

thinking={"type": "enabled", "effort": "high"},

messages=[{"role": "user", "content": "Review this code for security vulnerabilities and suggest fixes"}]

)

3. Use Auto for User-Facing Applications

When you can't predict query complexity, auto provides good default behavior:

# Good: Chatbot where queries vary in complexity

response = client.messages.create(

thinking={"type": "enabled", "effort": "auto"},

messages=[{"role": "user", "content": user_input}]

)

4. Implement Effort Routing

def get_effort_level(task_type: str) -> str:

simple_tasks = ["format", "extract", "summarize_short"]

complex_tasks = ["architecture", "security_review", "debug_complex"]

if task_type in simple_tasks:

return "low"

elif task_type in complex_tasks:

return "high"

else:

return "medium"

Accessing Thinking Content

Retrieve the model's reasoning process:

response = client.messages.create(

thinking={"type": "enabled", "effort": "high"},

messages=[...]

)

for block in response.content:

if block.type == "thinking":

print(f"Reasoning: {block.thinking}")

elif block.type == "text":

print(f"Response: {block.text}")

Comparison: Old vs New

AspectExtended Thinking (Old)Adaptive Thinking (New)
ControlOn/Off binaryGranular effort levels
CostAlways expensive when onProportional to complexity
LatencyAlways slow when onScales with effort
OptimizationManual toggleAuto mode available

Conclusion

Adaptive Thinking transforms Claude from a binary tool into a nuanced reasoner. By matching effort to task complexity, you can reduce costs 50-80% on simple tasks while maintaining deep reasoning capability when needed. Start with auto for general applications, then optimize with explicit effort levels as you understand your workload patterns.

Ready to Experience Claude 5?

Try Now