The Falling Price of Intelligence: Token Costs and the AI Ex

The Falling Price of Intelligence: Token Costs and the AI Expansion - Part 1

Written by Arbitrage • 2026-06-22 00:00:00

Token Costs and the AI Expansion Here's a puzzle worth sitting with. The cost of a single AI token, the basic unit of text a model reads and writes, has fallen so far it's almost a rounding error next to where it started. And yet the bills landing on enterprise desks keep getting bigger, not smaller. Cheaper intelligence hasn't produced cheaper AI. If anything, it's done the opposite. That gap, between a unit price in free fall and a total spend that keeps climbing, is the whole story of this cycle. Reading it correctly is the difference between seeing the AI buildout as a bubble waiting to pop and seeing it as something with real economic logic underneath. So let's walk the curve: where it started, where it sits now, and the conditions that look likely to shape where it goes.

How a Token Got Cheap

Start with the magnitude, because it's hard to overstate. Measured against early large-model rates, per-token prices have fallen on the order of 99.7%. That isn't a typo and it isn't a discount. It's a structural collapse in the cost of running a query, and it happened in a handful of years. The cleanest way to feel it is to track a single capability tier over time. Output pricing for GPT-4-level performance ran around 60 dollars per million tokens in early 2023. By early 2025 the same tier of capability had dropped to under 1.50 dollars per million, and it's kept sliding since. Same class of intelligence, a fraction of the price, inside two years. Three forces did the work. First, competition. With several frontier labs pushing models out at pace, pricing turned into a battleground rather than a posted rate. Second, efficiency. Better architectures, quantization, distillation, and inference tricks meant the same output took less compute to produce. Third, hardware. Each accelerator generation pushed more throughput through the same power envelope, dragging the cost floor lower.

Investors have started giving this its own name. One framing borrowed from the venture world is LLMflation, a deliberate echo of Moore's Law, the idea that the price of a unit of machine intelligence keeps halving on a predictable cadence. Some research puts the combined effect of pricing and efficiency at something close to a 200x reduction per year once you account for both. Whatever the exact figure, the shape is the point. This is a steep, sustained deflation curve, not a one-off price war that burns out. Same class of intelligence, a fraction of the price, inside two years. That's the shape that drives everything downstream.

The Cost Illusion

The decline didn't stop with the frontier tier. Across major providers, the average cost per million tokens fell from roughly 10 dollars to around 2.50 dollars in a single year, an industry-wide cut on the order of 80% from 2025 into 2026. By any normal reading of a market, falling unit costs should mean falling spend. Cheaper inputs, smaller bills. Except that's not what happened. Over the same window, enterprise AI bills roughly tripled. Here's the part worth tattooing somewhere visible: the price of a token fell while the total tab rose. That's the illusion. The headline number, cost per token, looks like deflation. The number that hits the budget tells the opposite story. Two mechanisms explain the gap:

Consumption outran price. When inference gets cheap, people don't do the same work for less money. They do far more work. The median developer now runs through something like 51 million tokens a month, and a single agentic session can burn 1 to 3.5 million tokens chewing through a task. Cut the price by 80%, then multiply the usage by far more than five, and the bill goes up.
The hidden cost stack. The model invoice is only part of the picture. A large share of what it actually costs to run AI in production sits outside the token bill entirely, in orchestration, retrieval, retries, vector databases, observability, and the engineering time to hold it all together. The token line gets cheaper while the system around it gets heavier.

There's a market signal hiding in the pricing models, too. Providers with deep infrastructure and real margins have been shifting from flat-rate, all-you-can-eat plans toward metered consumption. That's an illustrative tell. When even the best-capitalized players start putting a meter on usage, it suggests unlimited token consumption is hard to sustain economically, even for them. The headline number looks like deflation. The number that hits the budget tells the opposite story.

Come back tomorrow for Part 2 of this topic!

Like this article? Share it with a friend!

Link copied!

Arbitrage Blog

The Falling Price of Intelligence: Token Costs and the AI Expansion - Part 1

Like this article? Share it with a friend!