When AI Tokens Cost More Than Engineers

Editor J
When AI Tokens Cost More Than Engineers

Microsoft canceled its internal Claude Code licenses, and Uber burned a year's AI budget in four months. The token costs generated by coding agents have caught up with the payroll they were meant to replace.

Companies that promised artificial intelligence would shrink labor costs are now confronting the opposite: token bills that rival payroll. Microsoft canceled most of the Claude Code licenses used by its engineers, while Uber burned a full year's AI budget in four months.

The numbers Fortune compiled on May 22 point to a single conclusion: the token costs generated by a coding agent have caught up with the salary of the developer it was meant to replace. These bills surged first at the firms pushing AI the hardest.

Why Microsoft Pulled the Plug on Claude Code

Microsoft CEO Satya Nadella on stage
Microsoft CEO Satya Nadella

It is telling that Microsoft was the first to retreat. In mid-May, the company began retiring most of its internal Claude Code licenses. Its Experiences and Devices division—which oversees Windows, Microsoft 365, Outlook, Teams, and Surface—loses access on June 30.

The reason is strictly financial. Over nearly six months of internal use, token-based billing far exceeded internal forecasts. BeInCrypto reported that company leadership deemed the token-based billing unsustainable at deployment scale.

Microsoft is instead steering engineers toward the GitHub Copilot CLI, which it owns. The calculus is straightforward: rather than wiring large monthly checks to rival Anthropic, the company will absorb the compute costs in-house. Given that engineers at a company with $13 billion invested in OpenAI were opting for a competitor's coding agent, the move serves as more than just a cost-saving measure.

Uber Burned a Year's AI Budget in Four Months

While Microsoft pulled back, Uber demonstrated in real time how quickly these costs compound. Having rolled out Claude Code last December, the company spent its entire 2026 AI budget by April—a span of just four months.

Adoption simply outran the budget. Across a 5,000-engineer organization, Claude Code uptake jumped from 32% to 84%, with roughly 70% of all committed code written by AI. Heavy users generated $500 to $2,000 a month in API costs per head, a surge accelerated by an internal leaderboard that turned usage into a competition.

Uber Chief Technology Officer Praveen Neppalli Naga was blunt regarding the overrun: 'I'm back to the drawing board, because the budget I thought I would need is blown away already.' The more effectively the tool performed, the more engineers relied on it, driving bills higher uncontrollably.

When Compute Costs More Than the Engineer

Aerial view of Nvidia headquarters
Nvidia's Voyager and Endeavor headquarters campus

If Uber appeared to be a solitary budgeting accident, recent comments from Nvidia suggest a broader trend. Bryan Catanzaro, Nvidia's vice president of applied deep learning, stated plainly: 'For my team, the cost of compute is far beyond the costs of the employees.' Set against Nvidia senior software engineer salaries of $192,000 to $243,000, the claim carries significant weight.

The paradox is striking. Firms integrated AI to cut headcount and save money, only to watch the resulting token costs exceed the payroll they eliminated. Cybernews framed the situation bluntly: companies that fired workers to save money now spend more on tokens than they did on salaries.

Yet the momentum has not reversed. Nvidia CEO Jensen Huang envisions a future where 100 AI agents work alongside every employee. More agents inherently require more tokens. Even as costs spiral, the industry's focus remains fixed on deploying more autonomous systems.

Inference Is Getting Cheaper, So Why Is the Bill Bigger?

A logical question follows: if the per-token price keeps falling, why do overall bills keep climbing? Gartner expects inference costs for sophisticated models to drop roughly 90% by 2030. However, the same outlook projects total enterprise AI spending to rise.

As inference costs fall, cheaper tokens simply drive higher usage. As autonomous agentic workloads spread, raw token consumption explodes. Goldman Sachs projects token use will grow 24-fold by 2030, reaching 120 quadrillion a month—a volume that entirely swallows the price cuts. Investing.com calls this dynamic a 'token pricing crisis' that currently props up the revenue race between OpenAI and Anthropic.

The strain is already visible in corporate financials. Currently, 85% of companies have missed their AI cost forecasts by more than 10%, while 84% report that AI spending has reduced gross margins by over six percentage points. The share of firms operating a dedicated FinOps team for AI expenditures doubled in a year, jumping from 31% to 63%. Gartner's Will Sommer warned that chief product officers 'should not confuse the deflation of commodity tokens with the democratization of frontier reasoning.'

Companies Are Trading Down to Cheaper Models

The practical exit companies have chosen is not to use less, but to use cheaper alternatives. Microsoft pushing the GitHub Copilot CLI in place of Claude Code fits this exact pattern. The GitHub Copilot route lets Microsoft absorb compute on its own infrastructure, sidestepping metered token bills. Rather than deploying the most expensive model for every task, splitting workloads by difficulty and swapping models per tier is becoming standard practice.

The price gaps between tiers are massive. For the same prompt, costs between Opus and Haiku can differ by up to 50 times. By reserving the expensive model for complex design and debugging, and handing routine edits, tests, and documentation to a lighter tier, session costs drop by 60% to 80%. Many teams now split their stack by tool: Copilot handles inline completion, Cursor manages IDE work, and Claude Code is reserved exclusively for heavy agentic jobs.

The methods used to police spending are just as blunt. Meta built a 'Claudeonomics' leaderboard to track internal AI use, while Amazon pushes 'tokenmaxxing' to conserve tokens. The real battleground for enterprise AI adoption in 2026 is shifting rapidly—moving away from how smart a model is, and toward how cheaply it can get the same job done.

Menu