ProductFebruary 17, 2026

Claude Sonnet 4.6 Introduces Adaptive Thinking, Replacing Extended Mode

New Adaptive Thinking engine allows dynamic reasoning depth via effort parameter, optimizing cost and speed per request.

From Binary to Spectrum: Reasoning Gets Flexible

Claude Sonnet 4.6 introduces Adaptive Thinking, replacing the previous binary "extended thinking" mode with granular control over reasoning depth.

The Problem with Binary Thinking

Previous Claude models had two modes:

  • Standard: Fast and cheap, but shallow reasoning
  • Extended Thinking: Slow and expensive, but thorough

This forced developers to choose between speed and quality with no middle ground. A simple query and a complex architecture question cost the same when extended thinking was enabled.

The Effort Parameter

Adaptive Thinking introduces the `effort` parameter with four levels:

LevelThinking TokensLatencyCostUse Case
low~100-500~1s1xSimple Q&A, formatting
medium~500-2000~2-3s1.5-2xStandard coding tasks
high~2000-10000~5-15s3-5xComplex problems
autoVariesVaries1-5xGeneral applications

Implementation

python

# Simple task - minimal thinking

response = client.messages.create(

model="claude-sonnet-4-6-20260217",

thinking={"type": "enabled", "effort": "low"},

messages=[...]

)

# Complex task - deep reasoning

response = client.messages.create(

model="claude-sonnet-4-6-20260217",

thinking={"type": "enabled", "effort": "high"},

messages=[...]

)

# Let the model decide

response = client.messages.create(

model="claude-sonnet-4-6-20260217",

thinking={"type": "enabled", "effort": "auto"},

messages=[...]

)



Cost Savings

Early adopters report significant savings:

"We were running everything at max thinking. Now simple queries use 'low' effort—our costs dropped 40% with no quality impact on routine tasks." — Lead Developer, FinTech startup

Auto Mode Performance

When set to `auto`, Claude dynamically assesses query complexity and allocates reasoning resources accordingly. Testing shows:

  • 70% of queries classified as low/medium effort
  • Auto typically matches human-selected effort levels
  • Edge cases occasionally under-estimate complexity

Accessing Thinking Content

Developers can inspect the reasoning process:

python

for block in response.content:

if block.type == "thinking":

print(f"Reasoning: {block.thinking}")



Best Practice

Start with `auto` for general applications, then optimize with explicit effort levels as you understand your workload patterns.

Migration

The `budget_tokens` parameter still works but is deprecated. New code should use `effort` instead.

Ready to Experience Claude 5?

Try Now