The cost of AI is a design question, not a budget line

Rising AI bills look like a budget problem, but they are really a design problem. Here, we cover what the organisations getting more from their AI spend are doing differently, and the questions boards should be asking.

It has been widely reported that Uber had exhausted its 2026 AI budget by April and was now capping individual employees on how much they could spend on AI tools each month. Other large enterprises are doing the same: capping budgets, directing people toward cheaper models, slowing adoption. The reflex is rational. It is also the wrong reflex, because it treats a design problem as a budget problem.

The blended cost of enterprise inference, averaged across model tiers and providers, has fallen by roughly two-thirds in a year, and yet bills keep climbing. What has changed is not the price of AI but how work moves through the organisation. Tokens are a unit of billing, not a unit of value. The number that matters is cost per outcome, the cost per piece of work that was useful, deployed, and value-generating, and it appears on no invoice.

The questions boards should be asking

When AI costs reach the leadership table, three questions tend to follow. None of them lead to the right answer.

  • What does this cost per user? The sharper question is what each piece of work produces, and at what cost.
  • How do we centralise procurement? The cost is not in the contract. It is generated downstream, in the workflows employees build and the agents they run. The sharper question is whether each piece of work is using the right technology.
  • What is the right cap on the budget? Caps cut the same way across every piece of work. The sharper question is which work needs which model and where, then sizing the budget around that.

What the leading operators are doing differently

Across organisations running using the same AI platforms, the spread between the most and least disciplined operators is as wide as twentyfold. The organisations doing this well are doing three things consistently:

  • Make spend visible at the workload level, not just the individual level. Leadership can then see what is being spent and on what outcome.
  • Match the work to the right tool. The most capable models are reserved for the genuinely hard problems. Routine work is routed to smaller, faster models and optimised workflows.
  • Treat AI access as a portfolio. Capability is allocated where it produces value, rather than handed out by default.

The most consequential move sits furthest from procurement: building smaller, specialised models for the repetitive, high-volume work where frontier capability is unnecessary. We have built them ourselves, small models trained for specific tasks, matching the quality of the most capable frontier models on that task at a fraction of the cost. The saving compounds every time the task runs.

The conversation worth having

The consumption explosion is rational, not reckless. Cheaper intelligence per task expands what gets attempted. The answer to rising bills is not less usage. It is better-directed usage.

The cost of intelligence is not going down. The cost of useful output is. The organisations closing the distance between those two are redesigning how work moves, so the right capability reaches the right task. The question for leadership is not what AI costs the organisation. It is what the organisation has redesigned around it.

If you are thinking about optimising how work moves through your organisation to get more from your AI, reach out to our team.