Tutorial

Claude Sonnet 4.6 1M Token Context: Complete Developer Guide

Master Claude Sonnet 4.6's 1 million token context window: implementation, context compaction, pricing, and best practices for processing massive documents.

February 2026

TL;DR

Claude Sonnet 4.6's 1M token context window (beta) can process ~750,000 words—equivalent to 5-10 full codebases or several books. Context compaction automatically summarizes older content, enabling effectively unlimited conversations. Premium pricing applies above 200K tokens.

Context Window Specifications

MetricValue
Maximum Context1,000,000 tokens
Approximate Words~750,000
Equivalent Pages~3,000 pages
Code Lines~150,000 lines
Standard Pricing Threshold200,000 tokens

Pricing Structure

Requests exceeding 200K input tokens incur premium long-context rates:

Context SizeInput PriceOutput Price
0-200K tokens$3/M$15/M
200K-1M tokens$6/M (2x)$30/M (2x)

A 500K token request would cost: (200K × $3) + (300K × $6) = $2.40 for input tokens.

Context Compaction

New to Sonnet 4.6, context compaction automatically manages long conversations:

    • When approaching the context limit, older messages are summarized
      • Critical information is preserved; verbose details are compressed
        • Enables effectively unlimited conversation length
          • Transparent to the user—no manual management needed
          # Context compaction happens automatically
          

          # No special configuration required

          response = client.messages.create(

          model="claude-sonnet-4-6-20260217",

          max_tokens=8192,

          messages=very_long_conversation # Can exceed 1M over time

          )

          Implementation Patterns

          Full Codebase Analysis

          import os
          
          

          def collect_codebase(directory: str) -> str:

          code_content = []

          for root, dirs, files in os.walk(directory):

          for file in files:

          if file.endswith(('.py', '.ts', '.js', '.go')):

          path = os.path.join(root, file)

          with open(path, 'r') as f:

          code_content.append(f"### {path}\n{f.read()}")

          return "\n\n".join(code_content)

          codebase = collect_codebase("./src") # ~500K tokens

          response = client.messages.create(

          model="claude-sonnet-4-6-20260217",

          max_tokens=8192,

          messages=[{

          "role": "user",

          "content": f"Analyze this codebase for security vulnerabilities:\n\n{codebase}"

          }]

          )

          Multi-Document RAG

          def process_documents(docs: list[str], query: str) -> str:
          

          # Combine all documents into context

          combined = "\n\n---\n\n".join([

          f"Document {i+1}:\n{doc}" for i, doc in enumerate(docs)

          ])

          response = client.messages.create(

          model="claude-sonnet-4-6-20260217",

          max_tokens=4096,

          messages=[{

          "role": "user",

          "content": f"Based on these documents:\n\n{combined}\n\nAnswer: {query}"

          }]

          )

          return response.content[0].text

          Book/Report Analysis

          # Load entire book (~300K tokens)
          

          with open("technical_manual.txt", "r") as f:

          book_content = f.read()

          response = client.messages.create(

          model="claude-sonnet-4-6-20260217",

          max_tokens=16384,

          messages=[{

          "role": "user",

          "content": f"""Here is a technical manual:

          {book_content}

          Please:

          1. Create a comprehensive table of contents

          2. Summarize each major section

          3. Identify any inconsistencies or errors

          4. Generate a quick reference guide"""

          }]

          )

          Optimization Strategies

          1. Prompt Caching (90% Savings)

          For repeated queries against the same large context:

          response = client.messages.create(
          

          model="claude-sonnet-4-6-20260217",

          max_tokens=4096,

          system=[{

          "type": "text",

          "text": large_static_context,

          "cache_control": {"type": "ephemeral"}

          }],

          messages=[{"role": "user", "content": varying_query}]

          )

          2. Batch Processing (50% Savings)

          batch = client.batches.create(
          

          requests=[

          {"custom_id": f"doc-{i}", "params": {...}}

          for i in range(100)

          ]

          )

          3. Strategic Context Placement

          Place the most important information at the beginning and end of the context—the model attends more strongly to these positions.

          Quality Considerations

          While Sonnet 4.6 handles 1M tokens, quality varies by task:

          Task TypeQuality at 1MNotes
          Search/RetrievalGoodMay miss deeply buried needles
          SummarizationExcellentHandles full books well
          Code AnalysisVery GoodArchitecture understanding strong
          Specific Q&AGoodBetter with clear context markers

          For needle-in-haystack retrieval at 1M scale, Opus 4.6 (76% accuracy) significantly outperforms Sonnet 4.6 (~18%).

          Limitations

            • 1M context is beta—expect occasional issues
              • Premium pricing above 200K tokens
                • Needle retrieval weaker than Opus at extreme lengths
                  • Latency increases with context size
                    • Output still limited to max_tokens (typically 8-16K)

                    Conclusion

                    The 1M context window transforms what's possible with AI: full codebase analysis, multi-document synthesis, and book-length processing become practical. Combined with context compaction and prompt caching, Sonnet 4.6 makes large-scale AI applications economically viable.

Ready to Experience Claude 5?

Try Now