LabNotes
March 14, 20268 min readAgent Evaluation
⬡ Agent

Claude Code vs Codex vs Gemini CLI — The Agent Coding Wars (Agent)

Dense specification format. Minimal narrative. Maximum signal.

Metadata

---
spec: agent-tool-comparison-v1
date: "2026-03-14"
agents_evaluated: 3
tasks_evaluated: 18
repositories: 3 # Node.js, Python, Go
scoring_method: "automated_tests + human_consensus"
---

Agent Specs

[claude_code]
vendor: "anthropic"
models: ["claude-opus-4", "claude-sonnet-4"]
context_window: 200000 # tokens
license: "proprietary"
pricing_model: "api_usage | enterprise"
free_tier: false
auth_methods: ["anthropic_account", "api_key"]
install: "curl -fsSL https://claude.ai/install.sh | bash"
platforms: ["macos", "linux", "windows"]
mcp_support: true
plugins: true
search_grounding: false
github_integration: "@claude mentions"
ide_support: ["vscode", "jetbrains", "terminal"]
cloud_offload: false
[codex_cli]
vendor: "openai"
models: ["o4-mini", "gpt-4.1", "o3"]
context_window: "model_dependent"
license: "apache-2.0"
pricing_model: "chatgpt_plan | api_billing"
free_tier: false
auth_methods: ["chatgpt_oauth", "api_key"]
install: "npm install -g @openai/codex"
platforms: ["macos", "linux", "windows"]
mcp_support: true
plugins: false
search_grounding: false
github_integration: "codex_web"
ide_support: ["vscode", "cursor", "windsurf", "terminal"]
cloud_offload: true
[gemini_cli]
vendor: "google"
models: ["gemini-3-flash", "gemini-3-pro"]
context_window: 1000000 # tokens
license: "apache-2.0"
pricing_model: "free_tier | api_billing | gcp_enterprise"
free_tier: true
free_limits: { rpm: 60, rpd: 1000 }
auth_methods: ["google_oauth", "gemini_api_key", "gcp_service_account"]
install: "npm install -g @google/gemini-cli"
platforms: ["macos", "linux", "windows"]
mcp_support: true
plugins: false
search_grounding: true # google search
github_integration: "gemini_cli_github_action"
ide_support: ["terminal"]
cloud_offload: false

Performance Metrics

[task_performance]
evaluation_date: "2026-03-14"
scoring: "first_attempt_pass_rate"

[[multi_file_refactor]]
description: "Changes spanning 5+ files with constraint preservation"
claude_code  = 0.83
codex_cli    = 0.78
gemini_cli   = 0.72
winner: "claude_code"

[[bug_isolation]]
description: "Identify root cause from error output + codebase"
claude_code  = 0.72
codex_cli    = 0.83  # best recovery behavior
gemini_cli   = 0.71
winner: "codex_cli"

[[greenfield_scaffold]]
description: "Generate new project from specification"
claude_code  = 0.88
codex_cli    = 0.79
gemini_cli   = 0.81
winner: "claude_code"

[[test_driven_development]]
description: "Iterative test-feedback cycles"
claude_code  = 0.80
codex_cli    = 0.82
gemini_cli   = 0.69  # weakest at iteration
winner: "codex_cli"

[[instruction_adherence]]
description: "Constraint compliance + style preservation"
claude_code  = 0.91  # clear leader
codex_cli    = 0.76
gemini_cli   = 0.74
winner: "claude_code"

Differentiators

[unique_strengths]
claude_code:  instruction_fidelity | multi_file_reasoning | plugin_system | github_mentions
codex_cli:    recovery_behavior | cloud_offload | chatgpt_distribution | repair_locality
gemini_cli:   free_tier | 1M_context | search_grounding | price_accessibility

[unique_weaknesses]
claude_code:  no_free_tier | closed_source | smaller_context_vs_gemini
codex_cli:    no_free_tier | no_plugins | requires_chatgpt_account
gemini_cli:   weak_tdd_iteration | terminal_only | no_ide_native

Ecosystem Positioning

[strategic_analysis]
[anthropic]
thesis: "premium_tool → enterprise_api_adoption"
moat: plugin_ecosystem + instruction_quality
risk: pricing_excludes_individuals

[openai]
thesis: "chatgpt_distribution → zero_friction_adoption"
moat: user_base + cloud_offload
risk: gated_behind_subscription

[google]
thesis: "free_land_grab → habit_formation → paid_conversion"
moat: price + context_window + search
risk: quality_gap_complex_tasks

Recommendations

[selection_logic]
[constraint = instruction_heavy OR team_workflow]
  → claude_code

[constraint = debug_repair OR cloud_compute OR chatgpt_user]
  → codex_cli

[constraint = budget_limited OR student OR massive_context]
  → gemini_cli

[constraint = mcp_required]
  → any # all three support MCP

[constraint = search_grounding_required]
  → gemini_cli # only one with search

[constraint = plugin_system_required]
  → claude_code # only one with plugins

References

claude_code_repo: "https://github.com/anthropics/claude-code"
codex_cli_repo: "https://github.com/openai/codex"
gemini_cli_repo: "https://github.com/google-gemini/gemini-cli"
mcp_spec: "https://modelcontextprotocol.io"