BenchmarkFebruary 17, 2026

Claude Sonnet 4.6 Achieves 72.5% OSWorld, Matching Opus Computer Use

Sonnet 4.6 ties Opus 4.6 on computer use benchmarks, enabling sophisticated desktop automation at mid-tier pricing.

Computer Use Democratized

Claude Sonnet 4.6's 72.5% score on OSWorld-Verified essentially ties Opus 4.6's 72.7%—bringing sophisticated desktop automation to mid-tier pricing.

What is OSWorld?

OSWorld tests AI models on real computer tasks:

  • Web browsing and form filling
  • Desktop application use
  • File management
  • Multi-step workflows
  • Cross-application tasks

Performance Comparison

ModelOSWorld-VerifiedPrice
Opus 4.672.7%$15/$75
Sonnet 4.672.5%$3/$15
Sonnet 4.561.4%$3/$15
GPT-5.2~65%$1.75/$14

Sonnet 4.6 improved 11+ points from Sonnet 4.5, reaching Opus parity.

Practical Capabilities

Sonnet 4.6 can now reliably:

Web Automation

  • Fill complex forms with validation
  • Navigate multi-step checkout flows
  • Extract data from dynamic websites

Desktop Tasks

  • Manipulate spreadsheets
  • Process documents across applications
  • Manage file systems

Enterprise Workflows

  • Expense report submission
  • Data entry automation
  • Testing and QA scenarios

Enterprise Interest

RPA vendors are taking notice:

"We're evaluating Sonnet 4.6 for our automation platform. At these performance levels and this price point, AI-first RPA becomes viable for mid-market companies." — VP Product, Automation startup

Implementation Example

python

# Simple form automation with Sonnet 4.6

response = client.messages.create(

model="claude-sonnet-4-6-20260217",

max_tokens=4096,

tools=[{"type": "computer_20241022", "name": "computer", ...}],

messages=[{

"role": "user",

"content": [

{"type": "image", "source": screenshot},

{"type": "text", "text": "Fill out this expense form: Date 2/17, Amount $145.50, Category: Travel"}

]

}]

)



Safety Considerations

Anthropic emphasizes safety with computer use:

  • Sandboxed execution recommended
  • Human approval for sensitive actions
  • Audit logging for compliance
  • Rate limiting to prevent runaway agents

The Pricing Impact

A typical enterprise computer use deployment:

  • Opus 4.6: ~$1,500/month for 20K tasks
  • Sonnet 4.6: ~$300/month for same tasks

80% cost reduction with equivalent performance.

What's Next

As computer use matures, expect:

  • Integration with enterprise RPA platforms
  • Compliance certifications for regulated industries
  • More sophisticated multi-step orchestration
  • Better handling of dynamic/animated UIs

Conclusion

Sonnet 4.6 has removed the cost barrier to AI-powered computer automation. What was premium capability six months ago is now standard tier.

Ready to Experience Claude 5?

Try Now