Claude Sonnet 4.6 1M Token Context: Complete Developer Guide
Master Claude Sonnet 4.6's 1 million token context window: implementation, context compaction, pricing, and best practices for processing massive documents.
TL;DR
Claude Sonnet 4.6's 1M token context window (beta) can process ~750,000 words—equivalent to 5-10 full codebases or several books. Context compaction automatically summarizes older content, enabling effectively unlimited conversations. Premium pricing applies above 200K tokens.
Context Window Specifications
| Metric | Value |
|---|
| Maximum Context | 1,000,000 tokens |
| Approximate Words | ~750,000 |
| Equivalent Pages | ~3,000 pages |
| Code Lines | ~150,000 lines |
| Standard Pricing Threshold | 200,000 tokens |
Pricing Structure
Requests exceeding 200K input tokens incur premium long-context rates:
| Context Size | Input Price | Output Price |
|---|
| 0-200K tokens | $3/M | $15/M |
| 200K-1M tokens | $6/M (2x) | $30/M (2x) |
A 500K token request would cost: (200K × $3) + (300K × $6) = $2.40 for input tokens.
Context Compaction
New to Sonnet 4.6, context compaction automatically manages long conversations:
- When approaching the context limit, older messages are summarized
- Critical information is preserved; verbose details are compressed
- Enables effectively unlimited conversation length
- Transparent to the user—no manual management needed
# Context compaction happens automatically# No special configuration required
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=8192,
messages=very_long_conversation # Can exceed 1M over time
)
Implementation Patterns
Full Codebase Analysis
import osdef collect_codebase(directory: str) -> str:
code_content = []
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith(('.py', '.ts', '.js', '.go')):
path = os.path.join(root, file)
with open(path, 'r') as f:
code_content.append(f"### {path}\n{f.read()}")
return "\n\n".join(code_content)
codebase = collect_codebase("./src") # ~500K tokens
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=8192,
messages=[{
"role": "user",
"content": f"Analyze this codebase for security vulnerabilities:\n\n{codebase}"
}]
)
Multi-Document RAG
def process_documents(docs: list[str], query: str) -> str:# Combine all documents into context
combined = "\n\n---\n\n".join([
f"Document {i+1}:\n{doc}" for i, doc in enumerate(docs)
])
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"Based on these documents:\n\n{combined}\n\nAnswer: {query}"
}]
)
return response.content[0].text
Book/Report Analysis
# Load entire book (~300K tokens)with open("technical_manual.txt", "r") as f:
book_content = f.read()
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=16384,
messages=[{
"role": "user",
"content": f"""Here is a technical manual:
{book_content}
Please:
1. Create a comprehensive table of contents
2. Summarize each major section
3. Identify any inconsistencies or errors
4. Generate a quick reference guide"""
}]
)
Optimization Strategies
1. Prompt Caching (90% Savings)
For repeated queries against the same large context:
response = client.messages.create(model="claude-sonnet-4-6-20260217",
max_tokens=4096,
system=[{
"type": "text",
"text": large_static_context,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": varying_query}]
)
2. Batch Processing (50% Savings)
batch = client.batches.create(requests=[
{"custom_id": f"doc-{i}", "params": {...}}
for i in range(100)
]
)
3. Strategic Context Placement
Place the most important information at the beginning and end of the context—the model attends more strongly to these positions.
Quality Considerations
While Sonnet 4.6 handles 1M tokens, quality varies by task:
| Task Type | Quality at 1M | Notes |
|---|
| Search/Retrieval | Good | May miss deeply buried needles |
| Summarization | Excellent | Handles full books well |
| Code Analysis | Very Good | Architecture understanding strong |
| Specific Q&A | Good | Better with clear context markers |
For needle-in-haystack retrieval at 1M scale, Opus 4.6 (76% accuracy) significantly outperforms Sonnet 4.6 (~18%).
Limitations
- 1M context is beta—expect occasional issues
- Premium pricing above 200K tokens
- Needle retrieval weaker than Opus at extreme lengths
- Latency increases with context size
- Output still limited to max_tokens (typically 8-16K)
Conclusion
The 1M context window transforms what's possible with AI: full codebase analysis, multi-document synthesis, and book-length processing become practical. Combined with context compaction and prompt caching, Sonnet 4.6 makes large-scale AI applications economically viable.