When your AI bill costs more than you expected

Uber gave Claude Code to 5,000 engineers earlier this year. By April, they’d burned through their entire 2026 AI coding budget. Four months in. Individual engineers were racking up $500 to $2,000 a month in usage, and the company is now publicly saying it’s “back to the drawing board” on how to budget for AI.

Uber has serious engineering leadership and real budgeting infrastructure. If they can blow a year’s AI budget in four months, it’s worth asking: how will you budget for your AI projects? And do you know what you’re getting for that spend?

How you’re charged

Most AI tools charge by the token. Tokens are units of text, roughly a word each. Every prompt you send and every response you get back gets measured in tokens, and you pay for each one.

The part that catches people out is the spread between models.

The pricing gap is enormous. Anthropic’s flagship Claude Opus currently costs around $25 per million output tokens. DeepSeek R1 delivers broadly comparable reasoning capability for around $2. That’s more than a 10x pricing spread for frontier-class models doing similar work. DeepSeek gets there through tighter software engineering rather than brute hardware, which is why the gap is so wide.

Drop to the mid-tier and the picture shifts again. Anthropic’s own Sonnet model scores almost identically to Opus on capability benchmarks, at 60% of the output cost. Google’s Gemini Flash sits at $2.50. For most tasks, this is where you should be looking.

Then there are the budget models. which come in below $1.25 per million tokens with DeepSeek V3 just $0.28. That’s 89 times cheaper than Opus. These models won’t top the capability charts, but will eat up your for summarisation, classification, formatting and routine data work no problem.

Capability and cost don’t move in lockstep. You can get 90% of the performance for 10% of the price if you pick the right model for the task. But most tools default to their most capable model regardless. Nobody’s stopping to ask whether a given task needs that much horsepower. So what starts as a reasonable per-task cost multiplies into something nobody budgeted for.

It’s already getting cheaper

AI model pricing has been dropping fast. Venture Capitalist firm a16z tracked this: GPT-4-level capability that cost around $30 per million tokens in early 2023 is available for under $1 today. Roughly a 10x reduction per year for equivalent performance.

Every time a new frontier model launches, the previous generation drops in price. And that previous generation is still perfectly capable of doing most of the work.

There’s also a less obvious trend. Not all cost reduction comes from bigger hardware. Chinese AI labs, DeepSeek being the clearest example, have been pushing efficiency through software engineering rather than brute compute. DeepSeek V3 is competitive with models that cost far more to run because the engineering underneath is tighter, not because it’s sitting on more expensive infrastructure. The cost floor hasn’t been reached yet.

The AI you’re paying top rate for today will be available at a fraction of the cost within a year or two. The pricing problem is real right now, but it’s heading in the right direction.

The fix is boring (and that’s the point)

This is the same pattern that shows up every time a new tool arrives faster than the management practice around it. Everyone gets access, nobody triages which tasks need the expensive version and which don’t. Spend balloons, and then someone pulls the budget and the whole thing stalls.

The companies getting this right aren’t doing anything revolutionary, they’re doing the tedious work of matching the model to the task.

Routine work goes cheap. Reformatting data, sorting emails, categorising tickets, tagging documents, summarising meeting notes. Budget-tier models handle all of this without breaking a sweat or the budget.

Everything else is a judgement call. Standard business tasks like template emails and routine reports sit comfortably in the mid-tier. Complex reasoning, ambiguous data interpretation, client-facing proposals are where the expensive models justify their cost, because that’s where the quality gap is actually visible. The hard part is getting someone to actually make the split, rather than letting every task default to the top tier.

Companies that have deployed this kind of routing report 30-70% cost reductions without losing output quality.

How we think about it

When Practical Intelligence deploy AI for a client, model selection is one of the first conversations. We work through what each part of the workflow actually needs and match the model to that. Our own internal systems run five different model tiers routed by task complexity. Routine retrieval goes to the cheapest model available. Complex analysis and anything client-facing goes to Sonnet or Opus. It requires someone to sit down and think about the work before throwing compute at it, and that’s the step most companies skip.

The uncomfortable question

AI spend is growing faster than most companies’ understanding of what they’re paying for. The tools and capabilities are real, but the default for most organisations is every task going to the most expensive model, with no visibility into what’s driving the bill and no framework for deciding what actually needs that level of capability.

Uber had a team of engineers and a dedicated budget, and they still got caught out. The question isn’t whether your AI costs will become a problem. It’s whether you’ll have the visibility to see it coming and the structure to do something about it before someone pulls the plug.

The companies that work this out now will be the ones that scale AI into something that pays for itself. Everyone else will keep finding out what it costs the hard way.