BreakingFebruary 10, 2026

Codex 5.3 Released: 77.3% Terminal-Bench, 56.8% SWE-Bench Pro

OpenAI releases GPT-5.3-Codex on February 5, 2026 - the most capable agentic coding model to date with breakthrough performance on terminal and coding benchmarks.

OpenAI Launches Most Capable Coding Model

On February 5, 2026, OpenAI released GPT-5.3-Codex, describing it as "the most capable agentic coding model to date." The model advances both frontier coding performance and general reasoning capabilities while being 25% faster than its predecessor.

Benchmark Performance

Terminal-Bench 2.0: 77.3% - Leading all models in terminal-driven tasks SWE-Bench Pro (Public): 56.8% accuracy across four programming languages OSWorld-Verified: 64.7% - Strong computer-use capabilities Speed: 25% faster than GPT-5.2-Codex with improved token efficiency

Technical Innovations

Self-Bootstrapping Development

Remarkably, GPT-5.3-Codex was instrumental in creating itself. The Codex team used early versions to:

  • Debug its own training process
  • Manage deployment infrastructure
  • Diagnose and fix test results
  • Optimize inference performance

Enhanced Capabilities

Agentic Coding: Autonomous multi-step task execution with minimal human intervention Terminal Mastery: Native-level command line proficiency surpassing previous models Multi-Language Support: Production-grade code generation in Python, JavaScript, TypeScript, Java, C++, Go, and Rust Token Efficiency: Uses fewer output tokens while maintaining quality - reducing API costs

Security & Safety

GPT-5.3-Codex is the first OpenAI model treated as "High" under the Preparedness Framework, particularly for Cybersecurity capabilities. Enhanced safeguards prevent malicious code generation while preserving legitimate security research functionality.

Availability & Pricing

ChatGPT Users: Available now with ChatGPT Plus, Team, and Enterprise plans API Access: $10/$30 per million tokens (input/output) Platform Integration: ChatGPT app, CLI, IDE extensions, and web interface Cloud Providers: AWS Bedrock and Azure OpenAI Service (Q1 2026)

Performance Comparison

ModelTerminal-BenchSWE-Bench ProSpeedPrice (Input)
Codex 5.377.3%56.8%1.8s$10/M
Claude Opus 4.668.4%54.2%3.2s$15/M
Gemini 3 Pro64.1%48.3%2.4s$7/M

Developer Reception

Early adopters report Codex 5.3 excels at:

  • Backend service development
  • Terminal automation and DevOps tasks
  • High-volume code generation
  • Bug fixing with rapid iteration

Some developers note Claude Code still leads in:

  • Deep architectural reasoning
  • Long-context codebase understanding
  • UI/UX design suggestions

Use Codex 5.3 If...

  • Speed is critical for your workflow
  • Working primarily with terminal/CLI tools
  • Need cost-effective high-volume generation
  • Building backend services and APIs
  • Require reliable, bug-free code on first attempt

Conclusion

GPT-5.3-Codex represents a significant leap in AI coding capability, particularly for terminal-driven and autonomous agent workflows. Its combination of performance, speed, and competitive pricing makes it a compelling choice for development teams.

The model's ability to help build itself demonstrates we're entering an era where AI systems actively participate in their own development - a paradigm shift with profound implications.

Ready to Experience Claude 5?

Try Now