Arbitrage Blog

Read the latest blog post!


The Falling Price of Intelligence: Token Costs and the AI Expansion - Part 2

Written by Arbitrage2026-06-23 00:00:00

Arbitrage Blog Image

If you haven't yet read yesterday's blog post, please do so before continuing here.

Jevons and the Expansion

To connect the falling price to the rising spend, it helps to reach back to a nineteenth-century observation. Studying coal, the economist William Stanley Jevons noticed something counterintuitive: as engines got more efficient and coal effectively got cheaper to use, total coal consumption didn't fall. It rose. Cheaper use unlocked more uses. Apply that to inference and the present makes sense. When a token gets cheaper, applications that weren't worth building at the old price suddenly pencil out. Lower the marginal cost and you don't get the same demand at a discount, you get a wave of new demand that wasn't viable before. The cheaper it gets, the more places it shows up, and total spend climbs even as each unit costs less.


The capital cycle is the clearest evidence the people writing the checks believe this. Hyperscaler capital expenditure for 2026 has clustered around the 700 to 725 billion dollar mark, a buildout that ranks among the largest in modern infrastructure history. And when a cheaper-training claim briefly spooked the market, the response from the spenders was telling. None of the major hyperscalers cut their plans. Every one of them reaffirmed or raised guidance within weeks. The logic was pure Jevons: if intelligence gets cheaper to run, more of it gets built, and the demand for compute expands rather than contracts.


Conditions, Not Forecasts

Nobody gets to call the exact path from here, and this piece won't pretend to. What's more useful is to lay out the conditions that look likely to govern the next leg, and the ones that could break the pattern. The conditions pointing toward continued deflation:

  • Hardware aimed squarely at inference. Next-generation accelerators are being designed with token cost as the headline target. One forthcoming platform is positioned around roughly a 10x reduction in inference cost versus the current generation. That's a deliberate push on the same cost floor that's been falling all along.
  • Custom silicon. In-house chips from the large cloud players are taking a growing share of inference workloads, and they tend to carry a meaningful total-cost advantage over merchant GPUs for the right jobs. More competition at the silicon layer keeps downward pressure on the per-token rate.

The conditions that could bend or break the pattern:

  • Demand durability. Some current usage may be propped up by consumer pricing held below true inference cost while providers chase scale. If that support fades, a slice of today's demand could prove softer than the run-rate suggests.
  • Physical limits. The binding constraint increasingly looks less like chips and more like power, grid queues, transformer lead times, and permits. Compute you can't energize doesn't serve a token.
  • Utilization risk. Infrastructure gets built well ahead of the revenue meant to justify it. If deployment lags the buildout, idle capacity and the impairment questions that follow become a real pattern to watch rather than a tail worry.

Hold those two lists side by side and the takeaway isn't a prediction, it's a tension. The cost curve has strong reasons to keep falling. The expansion has strong reasons to keep compounding. And the counterweights, demand quality and physical capacity, are exactly the variables that decide whether this stays a virtuous loop or runs into a wall.


Cheaper Tokens, Bigger Bills

Come back to the puzzle we opened with. The price of intelligence keeps falling, and that's precisely what keeps driving total AI spend higher rather than lower. It's not a contradiction once you see the mechanism. Cheaper tokens don't shrink the market, they widen it, and a wider market on a still-large per-unit base adds up fast. So the useful lens isn't the token price on its own, and it isn't the capex headline on its own. It's the spread between them. The gap between a unit cost in decline and a total consumption that keeps climbing is where this entire expansion actually lives. Watch that spread, and the rest of the story tends to follow.


A few things worth tracking from here: the ratio of hyperscaler capex to the revenue meant to justify it, inference cost per completed task rather than per raw token, and capacity utilization as the buildout matures. Those three, read together, say more about where this goes than any single price point ever will.

Like this article? Share it with a friend!