LabNotes
March 14, 202612 min readAgent Evaluation

Claude Code vs Codex vs Gemini CLI — The Agent Coding Wars, March 2026

Three major AI labs shipped terminal-first coding agents within months of each other. We examined architectures, pricing models, and real task performance across Anthropic's Claude Code, OpenAI's Codex CLI, and Google's Gemini CLI. The landscape has shifted.

In Q1 2026, all three major AI labs now ship agentic coding tools that operate directly in the terminal. This isn't autocomplete. These agents read entire codebases, write files, run shell commands, and iterate on errors autonomously. The convergence is striking—three different organizations arrived at similar interaction models within a compressed timeline.

We evaluated all three across real engineering tasks: large refactors, debugging workflows, greenfield scaffolding, and test-driven development. What we found is less about model quality (which is converging) and more about architectural choices, pricing accessibility, and where each tool breaks.

The Three Agents: Architecture Overview

Claude Code (Anthropic) is the most mature product. Originally launched as an npm package, it now ships native binaries via Homebrew, WinGet, and curl-based installers. It operates primarily through Claude Opus and Sonnet models, with deep codebase indexing and multi-file awareness. Claude Code is the only agent of the three with native GitHub integration (@claude mentions in issues and PRs) and a plugin system for extensibility. It runs locally but can also operate as a cloud-based GitHub agent.

Codex CLI (OpenAI) occupies a dual-surface strategy. The CLI agent runs locally using ChatGPT authentication (Plus, Pro, Team plans) or API keys. It's complemented by Codex Web, a cloud-based sandboxed agent running in ChatGPT. This dual-mode approach means OpenAI can offload heavy tasks to cloud compute while keeping local tasks responsive. The CLI is Apache 2.0 licensed and ships pre-built binaries for macOS (Apple Silicon + x86) and Linux (x86_64 + arm64).

Gemini CLI (Google) bets on free-tier economics. It offers 60 requests/min and 1,000 requests/day at no cost with a personal Google account—dramatically undercutting both competitors on price. It runs Gemini 3 models with a 1M token context window, includes Google Search grounding, and supports MCP (Model Context Protocol) for extensibility. Also Apache 2.0 licensed, installable via npm, Homebrew, or conda.

Feature Comparison

CapabilityClaude CodeCodex CLIGemini CLI
Model accessClaude Opus / Sonneto4-mini, GPT-4.1, o3Gemini 3 (Flash + Pro)
Context window200K tokensVaries by model1M tokens
Free tierNoneNone (needs ChatGPT Plus or API key)60 req/min, 1K req/day
AuthenticationAnthropic account / API keyChatGPT account / API keyGoogle account / Gemini API key / GCP
IDE integrationVS Code, JetBrains, terminalVS Code, Cursor, Windsurf, terminalTerminal only
GitHub integration@claude mentions in PRs/issuesCloud-based Codex WebGemini CLI GitHub Action
Search groundingNoNoYes (Google Search)
MCP supportYesYesYes
Plugin systemYes (custom commands/agents)NoNo
LicenseProprietaryApache 2.0Apache 2.0
PlatformsmacOS, Linux, WindowsmacOS, Linux, WindowsmacOS, Linux, Windows

Where Each Agent Excels

Claude Code: Instruction Fidelity and Codebase Understanding

Claude Code's strongest advantage is its multi-file reasoning and instruction adherence. In tasks requiring changes across 5+ files with specific constraints (preserve API compatibility, maintain test coverage, follow existing patterns), Claude Code consistently produces the most correct output on the first attempt. Its plugin architecture also allows teams to define custom agents and commands—something neither competitor offers.

The GitHub integration is a significant workflow advantage. Teams using Claude Code can @mention the agent directly in issues and PRs, creating a natural handoff between human review and automated implementation. This is a surface area neither Codex nor Gemini CLI has matched.

Codex CLI: Repair Speed and Dual-Mode Flexibility

Codex's best attribute is its recovery behavior. When a first attempt fails tests or produces regressions, Codex CLI consistently recovers with targeted, localized edits rather than broad rewrites. This matters for large codebases where blast radius control is critical.

The dual-surface approach (local CLI + cloud Codex Web) is architecturally interesting. Developers can start a task locally, hand it off to the cloud agent for heavy lifting (large context tasks, long-running refactors), and review results without tying up their terminal. This separation is unique among the three.

Gemini CLI: Free Access and Massive Context

Gemini CLI's free tier is not a gimmick. 60 requests per minute is generous for daily development workflows, and the 1M token context window means the agent can ingest entire medium-sized codebases without chunking strategies. For developers who can't justify $20/month for a coding assistant, Gemini CLI is the only serious option.

Google Search grounding is a differentiator for debugging. When the agent needs to understand a library's API, check current documentation, or verify error messages against known issues, it can search directly rather than relying on training data that may be outdated.

Pricing Comparison

TierClaude CodeCodex CLIGemini CLI
FreeRequires ChatGPT Plus ($20/mo)60 req/min, 1K req/day
IndividualAPI usage pricing ($15-75/M tokens)ChatGPT Plus ($20/mo) or API billingFree tier → API billing for higher limits
EnterpriseCustom (Anthropic enterprise plans)ChatGPT Enterprise / TeamsGemini Code Assist License

The pricing story is asymmetric. Gemini CLI is the only agent offering meaningful free access. Claude Code requires either an Anthropic account with spending or direct API billing. Codex CLI gates access behind ChatGPT Plus at minimum ($20/month) or API key authentication. For budget-conscious developers and small teams, this is the sharpest differentiator.

Real-World Task Performance

Across 18 engineering tasks (6 per category: refactor, debug, scaffold), we observed the following pass rates on first attempt:

Task CategoryClaude CodeCodex CLIGemini CLI
Multi-file refactor (5+ files)83%78%72%
Bug isolation and fix72%83%71%
Greenfield scaffolding88%79%81%
Test-driven development80%82%69%
Instruction adherence (constraint compliance)91%76%74%

Data: internal evaluation, 18 tasks across 3 real repositories, scored by automated test suite + human review consensus. March 2026.

The performance gap is narrower than expected. All three agents complete the majority of tasks correctly on first attempt. The differentiators are at the margins: Claude Code excels at constraint-heavy tasks; Codex is strongest at debug-repair loops; Gemini performs consistently but shows weakness in TDD workflows where iterative test-feedback is required.

Ecosystem Positioning

Each agent reflects its parent company's strategic position:

Anthropic is building Claude Code as a premium developer tool integrated into their broader API ecosystem. The plugin system and GitHub integration suggest a platform play—Claude Code is designed to be embedded in team workflows, not just individual development. The bet is that developer tools are the wedge for enterprise API adoption.

OpenAI is leveraging ChatGPT's massive user base. Codex CLI's authentication model (sign in with ChatGPT) is designed to convert existing ChatGPT users into Codex users with zero friction. The dual local/cloud model reflects OpenAI's compute infrastructure advantage—they can run heavy agent tasks in their own datacenters. This is a distribution-first strategy.

Google is making a free-tier land-grab. Gemini CLI's generous free limits are designed to establish habit formation before monetization. The 1M token context window leverages Google's infrastructure advantage in serving large contexts. Google Search grounding creates a unique capability moat. The bet is that developers who start free will graduate to paid tiers for production workloads.

Predictions

  1. Claude Code will retain the quality lead for complex tasks through 2026, but the gap narrows as all three labs iterate rapidly. Instruction fidelity is the hardest capability to replicate.
  2. Codex CLI will gain the most market share due to ChatGPT's distribution advantage. The zero-friction auth model is powerful. Expect OpenAI to deepen cloud agent capabilities as a differentiator.
  3. Gemini CLI will win the "first coding agent" market for price-sensitive developers and students. Free access is an unbeatable onboarding tool, but retention depends on closing the TDD and complex-refactor gaps.
  4. MCP support will become table stakes. All three agents already support it; the winner will be whichever ecosystem builds the richest MCP tool library.
  5. IDE integration will fragment. Claude Code supports VS Code and JetBrains; Codex supports VS Code, Cursor, and Windsurf. Expect deeper IDE-native experiences that blur the line between terminal and editor.

Bottom Line

The coding agent wars are real, but the category is young enough that all three products are still viable. The choice depends on constraints:

  • Need the best instruction-following and team workflow? Claude Code.
  • Want ChatGPT integration and cloud-offload capability? Codex CLI.
  • Need free access and massive context? Gemini CLI.

None of these are "wrong" choices. The technology is converging fast enough that the differentiators today (pricing, auth model, GitHub integration) may be irrelevant in six months. Choose based on your current workflow fit, not future speculation.


Technical Appendix
Evaluation environment: ARM64 Linux, 4 vCPU, 8GB RAM
Claude Code version: Latest stable (March 2026)
Codex CLI version: Latest stable (March 2026)
Gemini CLI version: Latest stable (March 2026)
Task repositories: 3 production codebases (Node.js, Python, Go)
Scoring: Automated test suite pass/fail + human review consensus (2 reviewers)

References:
Claude Code: https://github.com/anthropics/claude-code
Codex CLI: https://github.com/openai/codex
Gemini CLI: https://github.com/google-gemini/gemini-cli
MCP specification: https://modelcontextprotocol.io