Claude Sonnet 4.6 Computer Use Guide: Build Desktop Automation Agents

TL;DR

Claude Sonnet 4.6 achieves 72.5% on OSWorld-Verified—matching Opus 4.6's 72.7% at 1/5th the cost. Computer use enables AI agents to control desktops, browse the web, fill forms, and automate complex workflows. Available via API with proper safety controls.

What is Computer Use?

Computer use allows Claude to:

View screenshots and understand UI elements

Control mouse movements and clicks

Type keyboard input

Navigate applications and websites

Complete multi-step workflows autonomously

Benchmark Performance

Model	OSWorld-Verified	Cost (Input/Output)

Sonnet 4.6

72.5%

$3/$15

Opus 4.6

72.7%

$15/$75

GPT-5.2

~65%

$1.75/$14

Gemini 3 Pro

~60%

$1.25/$5

Sonnet 4.6 delivers Opus-tier computer use at Sonnet pricing.

Implementation

Basic Setup

import anthropic
import base64
from PIL import ImageGrab

client = anthropic.Anthropic()

def capture_screen() -> str:
    screenshot = ImageGrab.grab()
    buffer = io.BytesIO()
    screenshot.save(buffer, format="PNG")
    return base64.standard_b64encode(buffer.getvalue()).decode()

def computer_use_request(instruction: str, screenshot_b64: str):
    response = client.messages.create(
        model="claude-sonnet-4-6-20260217",
        max_tokens=4096,
        tools=[{
            "type": "computer_20241022",
            "name": "computer",
            "display_width_px": 1920,
            "display_height_px": 1080,
            "display_number": 1
        }],
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot_b64}},
                {"type": "text", "text": instruction}
            ]
        }]
    )
    return response

Processing Tool Calls

def execute_computer_action(tool_use):
    action = tool_use.input.get("action")

    if action == "screenshot":
        return capture_screen()

    elif action == "mouse_move":
        x, y = tool_use.input["coordinate"]
        pyautogui.moveTo(x, y)

    elif action == "left_click":
        pyautogui.click()

    elif action == "type":
        text = tool_use.input["text"]
        pyautogui.typewrite(text, interval=0.05)

    elif action == "key":
        key = tool_use.input["key"]
        pyautogui.press(key)

    elif action == "scroll":
        x, y = tool_use.input["coordinate"]
        direction = tool_use.input["direction"]
        amount = tool_use.input["amount"]
        pyautogui.scroll(amount if direction == "up" else -amount, x=x, y=y)

Agentic Loop

def run_computer_agent(task: str):
    messages = []
    screenshot = capture_screen()

    messages.append({
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}},
            {"type": "text", "text": task}
        ]
    })

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6-20260217",
            max_tokens=4096,
            tools=[computer_tool],
            messages=messages
        )

        if response.stop_reason == "end_turn":
            return response.content[-1].text

        # Process tool calls
        for block in response.content:
            if block.type == "tool_use":
                result = execute_computer_action(block)
                messages.append({"role": "assistant", "content": response.content})
                messages.append({
                    "role": "user",
                    "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}]
                })

Safety Considerations

Essential Safeguards

Sandboxing: Run in VM or container to isolate from host system

Confirmation: Require human approval for sensitive actions

Blocklists: Prevent access to sensitive URLs, applications, or directories

Monitoring: Log all actions for audit trails

Rate Limiting: Prevent runaway agents with action limits

Action Filtering

BLOCKED_ACTIONS = ["format", "delete_system", "sudo"]
SENSITIVE_URLS = ["bank", "password", "admin"]

def validate_action(action: dict) -> bool:
    if action.get("action") in BLOCKED_ACTIONS:
        return False
    if action.get("action") == "type":
        text = action.get("text", "").lower()
        if any(url in text for url in SENSITIVE_URLS):
            return False
    return True

Use Cases

1. Form Automation

run_computer_agent(

"Fill out the expense report form with: "

"Date: 2026-02-17, Amount: $145.50, Category: Travel, "

"Description: Client meeting transportation"

)

2. Data Extraction

run_computer_agent(

"Open the quarterly report PDF, extract the revenue figures "

"from Q1-Q4, and paste them into the spreadsheet in column B"

)

3. Testing Automation

run_computer_agent(

"Navigate to the login page, test these credentials: "

"user: [email protected], pass: Test123. "

"Verify the dashboard loads correctly and report any errors."

)

Best Practices

Clear Instructions: Be specific about UI elements and expected outcomes

Chunked Tasks: Break complex workflows into discrete steps

Error Recovery: Include instructions for handling unexpected states

Screenshot Frequency: Request fresh screenshots after major actions

Timeout Handling: Implement maximum action counts per task

Limitations

No real-time video processing (screenshot-based)

May struggle with dynamic/animated UI elements

Requires screen visibility (no headless mode)

Higher latency than traditional automation

Conclusion

Sonnet 4.6's computer use capability enables sophisticated desktop automation at accessible pricing. With proper safety controls, it can transform manual workflows into automated processes—from form filling to data extraction to QA testing.

Claude Sonnet 4.6 Computer Use: Complete Implementation Guide

TL;DR

What is Computer Use?

Benchmark Performance

Implementation

Basic Setup

Processing Tool Calls

Agentic Loop

Safety Considerations

Essential Safeguards

Action Filtering

Use Cases

1. Form Automation

2. Data Extraction

3. Testing Automation

Best Practices

Limitations

Conclusion

Ready to Experience Claude 5?