Claude Sonnet 4.6 Computer Use: Complete Implementation Guide
Build computer-use agents with Claude Sonnet 4.6: 72.5% OSWorld score, implementation patterns, safety considerations, and real-world applications.
TL;DR
Claude Sonnet 4.6 achieves 72.5% on OSWorld-Verified—matching Opus 4.6's 72.7% at 1/5th the cost. Computer use enables AI agents to control desktops, browse the web, fill forms, and automate complex workflows. Available via API with proper safety controls.
What is Computer Use?
Computer use allows Claude to:
- View screenshots and understand UI elements
- Control mouse movements and clicks
- Type keyboard input
- Navigate applications and websites
- Complete multi-step workflows autonomously
- Sandboxing: Run in VM or container to isolate from host system
- Confirmation: Require human approval for sensitive actions
- Blocklists: Prevent access to sensitive URLs, applications, or directories
- Monitoring: Log all actions for audit trails
- Rate Limiting: Prevent runaway agents with action limits
- Clear Instructions: Be specific about UI elements and expected outcomes
- Chunked Tasks: Break complex workflows into discrete steps
- Error Recovery: Include instructions for handling unexpected states
- Screenshot Frequency: Request fresh screenshots after major actions
- Timeout Handling: Implement maximum action counts per task
- No real-time video processing (screenshot-based)
- May struggle with dynamic/animated UI elements
- Requires screen visibility (no headless mode)
- Higher latency than traditional automation
Benchmark Performance
| Model | OSWorld-Verified | Cost (Input/Output) |
|---|
| Sonnet 4.6 | 72.5% | $3/$15 |
| Opus 4.6 | 72.7% | $15/$75 |
| GPT-5.2 | ~65% | $1.75/$14 |
| Gemini 3 Pro | ~60% | $1.25/$5 |
Sonnet 4.6 delivers Opus-tier computer use at Sonnet pricing.
Implementation
Basic Setup
import anthropicimport base64
from PIL import ImageGrab
client = anthropic.Anthropic()
def capture_screen() -> str:
screenshot = ImageGrab.grab()
buffer = io.BytesIO()
screenshot.save(buffer, format="PNG")
return base64.standard_b64encode(buffer.getvalue()).decode()
def computer_use_request(instruction: str, screenshot_b64: str):
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=4096,
tools=[{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"display_number": 1
}],
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot_b64}},
{"type": "text", "text": instruction}
]
}]
)
return response
Processing Tool Calls
def execute_computer_action(tool_use):action = tool_use.input.get("action")
if action == "screenshot":
return capture_screen()
elif action == "mouse_move":
x, y = tool_use.input["coordinate"]
pyautogui.moveTo(x, y)
elif action == "left_click":
pyautogui.click()
elif action == "type":
text = tool_use.input["text"]
pyautogui.typewrite(text, interval=0.05)
elif action == "key":
key = tool_use.input["key"]
pyautogui.press(key)
elif action == "scroll":
x, y = tool_use.input["coordinate"]
direction = tool_use.input["direction"]
amount = tool_use.input["amount"]
pyautogui.scroll(amount if direction == "up" else -amount, x=x, y=y)
Agentic Loop
def run_computer_agent(task: str):messages = []
screenshot = capture_screen()
messages.append({
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}},
{"type": "text", "text": task}
]
})
while True:
response = client.messages.create(
model="claude-sonnet-4-6-20260217",
max_tokens=4096,
tools=[computer_tool],
messages=messages
)
if response.stop_reason == "end_turn":
return response.content[-1].text
# Process tool calls
for block in response.content:
if block.type == "tool_use":
result = execute_computer_action(block)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}]
})
Safety Considerations
Essential Safeguards
Action Filtering
BLOCKED_ACTIONS = ["format", "delete_system", "sudo"]SENSITIVE_URLS = ["bank", "password", "admin"]
def validate_action(action: dict) -> bool:
if action.get("action") in BLOCKED_ACTIONS:
return False
if action.get("action") == "type":
text = action.get("text", "").lower()
if any(url in text for url in SENSITIVE_URLS):
return False
return True
Use Cases
1. Form Automation
run_computer_agent("Fill out the expense report form with: "
"Date: 2026-02-17, Amount: $145.50, Category: Travel, "
"Description: Client meeting transportation"
)
2. Data Extraction
run_computer_agent("Open the quarterly report PDF, extract the revenue figures "
"from Q1-Q4, and paste them into the spreadsheet in column B"
)
3. Testing Automation
run_computer_agent("Navigate to the login page, test these credentials: "
"user: [email protected], pass: Test123. "
"Verify the dashboard loads correctly and report any errors."
)
Best Practices
Limitations
Conclusion
Sonnet 4.6's computer use capability enables sophisticated desktop automation at accessible pricing. With proper safety controls, it can transform manual workflows into automated processes—from form filling to data extraction to QA testing.