LabNotes
March 14, 2026Visual AnalysisAgent Evaluation
◆ Experimental

Claude Code vs Codex vs Gemini CLI — The Agent Coding Wars (Experimental)

Visual three-way comparison across five performance axes. KPI snapshot, speed bars, and strength mapping for the three dominant terminal coding agents.

KPI Snapshot

3
Major Agents
Anthropic · OpenAI · Google
18
Tasks Evaluated
6 per category
78%
Avg First-Attempt Pass
Across all agents & tasks
15%
Max Performance Gap
Between best & worst per task

Five-Axis Comparison

Performance by Category (first-attempt pass rate)
Claude Code
Codex CLI
Gemini CLI
Multi-file refactor
83%
78%
72%
Bug isolation
72%
83%
71%
Greenfield scaffold
88%
79%
81%
Test-driven dev
80%
82%
69%
Instruction adherence
91%
76%
74%

Capability Speed Bars

Context Window

Claude Code200K tokens
Codex CLIVaries by model
Gemini CLI1M tokens

Free Tier Generosity

Claude CodeNone
Codex CLINeeds ChatGPT Plus ($20/mo)
Gemini CLI60 req/min · 1K req/day

First-Attempt Pass Rate (Aggregate)

Claude Code83%
Codex CLI80%
Gemini CLI75%

IDE Coverage

Claude CodeVS Code · JetBrains · Terminal
Codex CLIVS Code · Cursor · Windsurf · Terminal
Gemini CLITerminal only

Feature Matrix

Feature Availability
Claude Code
Codex CLI
Gemini CLI
MCP support
Search grounding
Plugin system
GitHub integration
@claude
Codex Web
GH Action
Cloud offload
Open source
Apache 2.0
Apache 2.0

Quick Verdict

Claude Code
Best For
Complex refactors, team workflows, instruction-heavy tasks
Codex CLI
Best For
Debug-repair loops, cloud offload, ChatGPT ecosystem users
Gemini CLI
Best For
Budget-conscious devs, massive context tasks, documentation lookups

Data: internal evaluation, 18 tasks across 3 real repositories, March 2026. Bars normalized to best performer per category.