March 28, 2026 · 6 min read · Infrastructure

H100 Prices Melting Up — What GPU Scarcity Signals About the Agent Economy

In October 2024, H100 rental prices bottomed out. The DeepSeek R1 shock had everyone predicting a GPU glut. Four months later, those same H100s are worth more than they were at launch. Dylan Patel noted they're worth more today than three years ago. This isn't normal depreciation behavior. Something fundamental has shifted in the demand curve for AI compute.

The Depreciation Reversal

GPUs follow predictable patterns. New architecture launches, older hardware depreciates 20-40% annually, cloud providers refresh fleets. The H100, introduced in late 2022, should be approaching mid-life obsolescence ahead of Blackwell's mainstream deployment.

Instead, rental prices have climbed steadily since December 2025. The inflection point isn't mysterious: it tracks exactly with the reasoning model breakthroughs and the first wave of production agent deployments.

          The core dynamic: A 4-year-old chip is appreciating because the software running on it (reasoning models, agent harnesses) makes it significantly more productive than original projections assumed.
        

Data center tokenomics run on razor-thin margins. When the underlying silicon becomes more expensive to rent, the entire stack feels pressure. Training runs get rescheduled. Inference batch sizes shrink. Startups with locked-in GPU contracts gain unexpected competitive moats.

Why This Time Is Different

Previous GPU demand spikes followed predictable patterns — crypto mining, training frenzies for foundation models. This one is driven by sustained inference workloads from agents that think longer and work iteratively.

Demand Driver	Duration	Hardware Impact
Crypto mining	6-18 months	Discrete spikes, quick corrections
Foundation model training	Project-based	Scheduled bursts, predictable
Reasoning agents (current)	Persistent, growing	Continuous load, harder to forecast

Training runs finish. Mining profitability crashes. But agents don't stop. A deployed coding assistant or research agent generates continuous inference load with high per-query compute requirements. Reasoning models use 10-100x more tokens than simple chat completions. The demand curve looks less like a series of spikes and more like a rising floor.

The Infrastructure Implications

For teams building on this infrastructure, the economics are shifting in real-time:

Locked-in contracts win: If you negotiated GPU access at 2024 prices, your unit economics just improved relative to new entrants
Local deployment becomes competitive: TurboQuant, RotorQuant, and other compression methods that let you run frontier-class models on consumer hardware aren't just nice-to-have—they're margin-preserving necessities
Model efficiency matters more: The gap between efficient and inefficient architectures translates directly to operating costs at scale

The frontier labs are responding to this compute scarcity with aggressive scaling. Anthropic's rumored Capybara tier represents a bet that demand for top-tier reasoning will justify the infrastructure investment. Google's reported funding of Anthropic's data center reinforces that this is a capital-availability game now, not just an algorithmic one.

What Builders Should Watch

GPU pricing is a lagging indicator. By the time rental rates spike, the underlying demand shift has already occurred. The signals to watch going forward:

Agent session duration: Longer sessions = sustained inference load = continued pressure on GPU supply
Multi-step reasoning adoption: Each reasoning step adds tokens and compute. Widespread adoption of o1/R1-style thinking increases per-task compute 10x or more
Hardware alternatives: TPUs, custom silicon, and edge inference become economically viable faster when GPU prices rise

The H100 price reversal isn't just an infrastructure story. It's an early signal that the agent economy has crossed from demonstration to production. The demand is real, sustained, and growing faster than supply can accommodate.

For teams building agent infrastructure at promptengines.com and across the ecosystem, the message is clear: compute efficiency isn't optimization anymore. It's survival.

Sources: Latent Space AI News, Dylan Patel (Dwarkesh interview), Matthew Sigel chart analysis