Claude Sonnet 4.6 Adaptive Thinking Guide: Dynamic Reasoning for Developers

TL;DR

Adaptive Thinking replaces Claude's binary "extended thinking" mode with dynamic, task-appropriate reasoning. Using the effort parameter (low/medium/high/auto), developers control how much the model reasons before responding—balancing speed, cost, and intelligence per request.

What is Adaptive Thinking?

Previous Claude models had two modes: standard (fast, cheap) or extended thinking (slow, expensive, thorough). Adaptive Thinking introduces a spectrum, allowing the model to automatically calibrate reasoning depth based on task complexity.

The key insight: not every question needs deep reasoning. "What's 2+2?" shouldn't cost the same as "Design a distributed system architecture."

The Effort Parameter

Control reasoning depth with the effort parameter:

Value	Behavior	Use Case

low

Minimal reasoning, fastest response

Simple Q&A, formatting, basic tasks

medium

Balanced reasoning

Most coding tasks, analysis

high

Deep reasoning, slower response

Complex problems, architecture

auto

Model decides based on query

General-purpose applications

Implementation

import anthropic

client = anthropic.Anthropic()

# Simple task - minimal thinking
simple_response = client.messages.create(
    model="claude-sonnet-4-6-20260217",
    max_tokens=1024,
    thinking={"type": "enabled", "effort": "low"},
    messages=[{"role": "user", "content": "Format this JSON"}]
)

# Complex task - deep reasoning
complex_response = client.messages.create(
    model="claude-sonnet-4-6-20260217",
    max_tokens=8192,
    thinking={"type": "enabled", "effort": "high"},
    messages=[{"role": "user", "content": "Design a microservices architecture for..."}]
)

# Let the model decide
auto_response = client.messages.create(
    model="claude-sonnet-4-6-20260217",
    max_tokens=4096,
    thinking={"type": "enabled", "effort": "auto"},
    messages=[{"role": "user", "content": user_query}]
)

Cost Implications

Effort Level	Thinking Tokens	Relative Cost	Latency

low

~100-500

~1s

medium

~500-2000

1.5-2x

~2-3s

high

~2000-10000

3-5x

~5-15s

auto

Varies

1-5x

Varies

Thinking tokens are billed at output rates ($15/M for Sonnet 4.6). A high-effort request generating 5,000 thinking tokens adds ~$0.075 to the cost.

Best Practices

1. Use Low Effort for Routine Tasks

# Good: Simple extraction doesn't need deep thinking
response = client.messages.create(
    thinking={"type": "enabled", "effort": "low"},
    messages=[{"role": "user", "content": "Extract the email from: John Smith "}]
)

2. Reserve High Effort for Complex Problems

# Good: Architecture decisions benefit from deep reasoning
response = client.messages.create(
    thinking={"type": "enabled", "effort": "high"},
    messages=[{"role": "user", "content": "Review this code for security vulnerabilities and suggest fixes"}]
)

3. Use Auto for User-Facing Applications

When you can't predict query complexity, auto provides good default behavior:

# Good: Chatbot where queries vary in complexity
response = client.messages.create(
    thinking={"type": "enabled", "effort": "auto"},
    messages=[{"role": "user", "content": user_input}]
)

4. Implement Effort Routing

def get_effort_level(task_type: str) -> str:
    simple_tasks = ["format", "extract", "summarize_short"]
    complex_tasks = ["architecture", "security_review", "debug_complex"]

    if task_type in simple_tasks:
        return "low"
    elif task_type in complex_tasks:
        return "high"
    else:
        return "medium"

Accessing Thinking Content

Retrieve the model's reasoning process:

response = client.messages.create(
    thinking={"type": "enabled", "effort": "high"},
    messages=[...]
)

for block in response.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

Comparison: Old vs New

Aspect	Extended Thinking (Old)	Adaptive Thinking (New)

Control

On/Off binary

Granular effort levels

Cost

Always expensive when on

Proportional to complexity

Latency

Always slow when on

Scales with effort

Optimization

Manual toggle

Auto mode available

Conclusion

Adaptive Thinking transforms Claude from a binary tool into a nuanced reasoner. By matching effort to task complexity, you can reduce costs 50-80% on simple tasks while maintaining deep reasoning capability when needed. Start with auto for general applications, then optimize with explicit effort levels as you understand your workload patterns.

Claude Sonnet 4.6 Adaptive Thinking: Complete Developer Guide