Tutorial

Claude Sonnet 4.6 Computer Use: Complete Implementation Guide

Build computer-use agents with Claude Sonnet 4.6: 72.5% OSWorld score, implementation patterns, safety considerations, and real-world applications.

February 2026

TL;DR

Claude Sonnet 4.6 achieves 72.5% on OSWorld-Verified—matching Opus 4.6's 72.7% at 1/5th the cost. Computer use enables AI agents to control desktops, browse the web, fill forms, and automate complex workflows. Available via API with proper safety controls.

What is Computer Use?

Computer use allows Claude to:

    • View screenshots and understand UI elements
      • Control mouse movements and clicks
        • Type keyboard input
          • Navigate applications and websites
            • Complete multi-step workflows autonomously

            Benchmark Performance

            ModelOSWorld-VerifiedCost (Input/Output)
            Sonnet 4.672.5%$3/$15
            Opus 4.672.7%$15/$75
            GPT-5.2~65%$1.75/$14
            Gemini 3 Pro~60%$1.25/$5

            Sonnet 4.6 delivers Opus-tier computer use at Sonnet pricing.

            Implementation

            Basic Setup

            import anthropic
            

            import base64

            from PIL import ImageGrab

            client = anthropic.Anthropic()

            def capture_screen() -> str:

            screenshot = ImageGrab.grab()

            buffer = io.BytesIO()

            screenshot.save(buffer, format="PNG")

            return base64.standard_b64encode(buffer.getvalue()).decode()

            def computer_use_request(instruction: str, screenshot_b64: str):

            response = client.messages.create(

            model="claude-sonnet-4-6-20260217",

            max_tokens=4096,

            tools=[{

            "type": "computer_20241022",

            "name": "computer",

            "display_width_px": 1920,

            "display_height_px": 1080,

            "display_number": 1

            }],

            messages=[{

            "role": "user",

            "content": [

            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot_b64}},

            {"type": "text", "text": instruction}

            ]

            }]

            )

            return response

            Processing Tool Calls

            def execute_computer_action(tool_use):
            

            action = tool_use.input.get("action")

            if action == "screenshot":

            return capture_screen()

            elif action == "mouse_move":

            x, y = tool_use.input["coordinate"]

            pyautogui.moveTo(x, y)

            elif action == "left_click":

            pyautogui.click()

            elif action == "type":

            text = tool_use.input["text"]

            pyautogui.typewrite(text, interval=0.05)

            elif action == "key":

            key = tool_use.input["key"]

            pyautogui.press(key)

            elif action == "scroll":

            x, y = tool_use.input["coordinate"]

            direction = tool_use.input["direction"]

            amount = tool_use.input["amount"]

            pyautogui.scroll(amount if direction == "up" else -amount, x=x, y=y)

            Agentic Loop

            def run_computer_agent(task: str):
            

            messages = []

            screenshot = capture_screen()

            messages.append({

            "role": "user",

            "content": [

            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": screenshot}},

            {"type": "text", "text": task}

            ]

            })

            while True:

            response = client.messages.create(

            model="claude-sonnet-4-6-20260217",

            max_tokens=4096,

            tools=[computer_tool],

            messages=messages

            )

            if response.stop_reason == "end_turn":

            return response.content[-1].text

            # Process tool calls

            for block in response.content:

            if block.type == "tool_use":

            result = execute_computer_action(block)

            messages.append({"role": "assistant", "content": response.content})

            messages.append({

            "role": "user",

            "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}]

            })

            Safety Considerations

            Essential Safeguards

              • Sandboxing: Run in VM or container to isolate from host system
                • Confirmation: Require human approval for sensitive actions
                  • Blocklists: Prevent access to sensitive URLs, applications, or directories
                    • Monitoring: Log all actions for audit trails
                      • Rate Limiting: Prevent runaway agents with action limits

                      Action Filtering

                      BLOCKED_ACTIONS = ["format", "delete_system", "sudo"]
                      

                      SENSITIVE_URLS = ["bank", "password", "admin"]

                      def validate_action(action: dict) -> bool:

                      if action.get("action") in BLOCKED_ACTIONS:

                      return False

                      if action.get("action") == "type":

                      text = action.get("text", "").lower()

                      if any(url in text for url in SENSITIVE_URLS):

                      return False

                      return True

                      Use Cases

                      1. Form Automation

                      run_computer_agent(
                      

                      "Fill out the expense report form with: "

                      "Date: 2026-02-17, Amount: $145.50, Category: Travel, "

                      "Description: Client meeting transportation"

                      )

                      2. Data Extraction

                      run_computer_agent(
                      

                      "Open the quarterly report PDF, extract the revenue figures "

                      "from Q1-Q4, and paste them into the spreadsheet in column B"

                      )

                      3. Testing Automation

                      run_computer_agent(
                      

                      "Navigate to the login page, test these credentials: "

                      "user: [email protected], pass: Test123. "

                      "Verify the dashboard loads correctly and report any errors."

                      )

                      Best Practices

                        • Clear Instructions: Be specific about UI elements and expected outcomes
                          • Chunked Tasks: Break complex workflows into discrete steps
                            • Error Recovery: Include instructions for handling unexpected states
                              • Screenshot Frequency: Request fresh screenshots after major actions
                                • Timeout Handling: Implement maximum action counts per task

                      Limitations

                        • No real-time video processing (screenshot-based)
                          • May struggle with dynamic/animated UI elements
                            • Requires screen visibility (no headless mode)
                              • Higher latency than traditional automation

                              Conclusion

                              Sonnet 4.6's computer use capability enables sophisticated desktop automation at accessible pricing. With proper safety controls, it can transform manual workflows into automated processes—from form filling to data extraction to QA testing.

Ready to Experience Claude 5?

Try Now