Claude Sonnet 4.6 Achieves 72.5% OSWorld, Matching Opus Computer Use
Sonnet 4.6 ties Opus 4.6 on computer use benchmarks, enabling sophisticated desktop automation at mid-tier pricing.
Computer Use Democratized
Claude Sonnet 4.6's 72.5% score on OSWorld-Verified essentially ties Opus 4.6's 72.7%—bringing sophisticated desktop automation to mid-tier pricing.
What is OSWorld?
OSWorld tests AI models on real computer tasks:
- Web browsing and form filling
- Desktop application use
- File management
- Multi-step workflows
- Cross-application tasks
Performance Comparison
| Model | OSWorld-Verified | Price |
| Opus 4.6 | 72.7% | $15/$75 |
| Sonnet 4.6 | 72.5% | $3/$15 |
| Sonnet 4.5 | 61.4% | $3/$15 |
| GPT-5.2 | ~65% | $1.75/$14 |
Sonnet 4.6 improved 11+ points from Sonnet 4.5, reaching Opus parity.
Practical Capabilities
Sonnet 4.6 can now reliably:
Web Automation
- Fill complex forms with validation
- Navigate multi-step checkout flows
- Extract data from dynamic websites
Desktop Tasks
- Manipulate spreadsheets
- Process documents across applications
- Manage file systems
Enterprise Workflows
- Expense report submission
- Data entry automation
- Testing and QA scenarios
Enterprise Interest
RPA vendors are taking notice:
"We're evaluating Sonnet 4.6 for our automation platform. At these performance levels and this price point, AI-first RPA becomes viable for mid-market companies." — VP Product, Automation startup
Implementation Example
python
# Simple form automation with Sonnet 4.6
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=4096,
tools=[{"type": "computer_20241022", "name": "computer", ...}],
messages=[{
"role": "user",
"content": [
{"type": "image", "source": screenshot},
{"type": "text", "text": "Fill out this expense form: Date 2/17, Amount $145.50, Category: Travel"}
]
}]
)
Safety Considerations
Anthropic emphasizes safety with computer use:
- Sandboxed execution recommended
- Human approval for sensitive actions
- Audit logging for compliance
- Rate limiting to prevent runaway agents
The Pricing Impact
A typical enterprise computer use deployment:
- Opus 4.6: ~$1,500/month for 20K tasks
- Sonnet 4.6: ~$300/month for same tasks
80% cost reduction with equivalent performance.
What's Next
As computer use matures, expect:
- Integration with enterprise RPA platforms
- Compliance certifications for regulated industries
- More sophisticated multi-step orchestration
- Better handling of dynamic/animated UIs
Conclusion
Sonnet 4.6 has removed the cost barrier to AI-powered computer automation. What was premium capability six months ago is now standard tier.