Codex 5.3 Released: 77.3% Terminal-Bench, 56.8% SWE-Bench Pro
OpenAI releases GPT-5.3-Codex on February 5, 2026 - the most capable agentic coding model to date with breakthrough performance on terminal and coding benchmarks.
OpenAI Launches Most Capable Coding Model
On February 5, 2026, OpenAI released GPT-5.3-Codex, describing it as "the most capable agentic coding model to date." The model advances both frontier coding performance and general reasoning capabilities while being 25% faster than its predecessor.
Benchmark Performance
Terminal-Bench 2.0: 77.3% - Leading all models in terminal-driven tasks SWE-Bench Pro (Public): 56.8% accuracy across four programming languages OSWorld-Verified: 64.7% - Strong computer-use capabilities Speed: 25% faster than GPT-5.2-Codex with improved token efficiencyTechnical Innovations
Self-Bootstrapping Development
Remarkably, GPT-5.3-Codex was instrumental in creating itself. The Codex team used early versions to:
- Debug its own training process
- Manage deployment infrastructure
- Diagnose and fix test results
- Optimize inference performance
Enhanced Capabilities
Agentic Coding: Autonomous multi-step task execution with minimal human intervention Terminal Mastery: Native-level command line proficiency surpassing previous models Multi-Language Support: Production-grade code generation in Python, JavaScript, TypeScript, Java, C++, Go, and Rust Token Efficiency: Uses fewer output tokens while maintaining quality - reducing API costsSecurity & Safety
GPT-5.3-Codex is the first OpenAI model treated as "High" under the Preparedness Framework, particularly for Cybersecurity capabilities. Enhanced safeguards prevent malicious code generation while preserving legitimate security research functionality.
Availability & Pricing
ChatGPT Users: Available now with ChatGPT Plus, Team, and Enterprise plans API Access: $10/$30 per million tokens (input/output) Platform Integration: ChatGPT app, CLI, IDE extensions, and web interface Cloud Providers: AWS Bedrock and Azure OpenAI Service (Q1 2026)Performance Comparison
| Model | Terminal-Bench | SWE-Bench Pro | Speed | Price (Input) |
| Codex 5.3 | 77.3% | 56.8% | 1.8s | $10/M |
| Claude Opus 4.6 | 68.4% | 54.2% | 3.2s | $15/M |
| Gemini 3 Pro | 64.1% | 48.3% | 2.4s | $7/M |
Developer Reception
Early adopters report Codex 5.3 excels at:
- Backend service development
- Terminal automation and DevOps tasks
- High-volume code generation
- Bug fixing with rapid iteration
Some developers note Claude Code still leads in:
- Deep architectural reasoning
- Long-context codebase understanding
- UI/UX design suggestions
Use Codex 5.3 If...
- Speed is critical for your workflow
- Working primarily with terminal/CLI tools
- Need cost-effective high-volume generation
- Building backend services and APIs
- Require reliable, bug-free code on first attempt
Conclusion
GPT-5.3-Codex represents a significant leap in AI coding capability, particularly for terminal-driven and autonomous agent workflows. Its combination of performance, speed, and competitive pricing makes it a compelling choice for development teams.
The model's ability to help build itself demonstrates we're entering an era where AI systems actively participate in their own development - a paradigm shift with profound implications.