Tokens are the basic unit of AI usage: words and characters that make up both the queries users send and the output models generate.
Chatting with an AI consumes a couple of hundred tokens per paragraph. Agentic AI, where models write code, browse the web, and execute multi-step workflows, burns through thousands more per session.
Using the rates of Anthropic’s latest model, one million tokens of input (prompts) costs $5, and one million tokens of output (the model’s responses) costs $25.
AI companies cite the boom in token consumption to justify the hundreds of billions of dollars being spent on infrastructure to serve it.
But token consumption is becoming a distorted metric.
Meta and Shopify say they have created internal leaderboards that track how many tokens employees use. Nvidia CEO Jensen Huang has said he’d be “deeply alarmed” if an engineer earning $500,000 a year wasn’t using at least $250,000 worth of compute — measuring what an engineer spends on AI instead of what they produce with it.
Once companies start measuring AI adoption by volume, employees optimize for the metric instead of the outcome.
“If your goal is to just burn a lot of money, there are easy ways to do that,” said Ali Ghodsi, CEO of Databricks, which processes AI workloads for thousands of enterprises. “Resubmit the query to ten places. Put up a loop that just does it again and again. It’s going to cost a lot of money and not lead to anything.”
Jen Stave, executive director of the Harvard Business School AI Institute, hears the same from enterprise leaders.
“I’ve talked to a dozen CTOs or CIOs who are all saying, ‘Actually I’m having a really hard time finding an ROI framework for this,'” she said.
Anthropic is planning for the possibility that the demand projections are wrong.
CEO Dario Amodei has described what he calls a “cone of uncertainty” – data centers take one to two years to build, so companies are committing billions now for demand they can’t yet verify. Buy too little and lose customers when you don’t have enough capacity. Buy too much and revenue doesn’t arrive on schedule, the math stops working.
“If you’re off by a couple years, that can be ruinous,” Amodei said on the Dwarkesh Patel podcast in February. “I get the impression that some of the other companies have not written down the spreadsheet. They’re just doing stuff because it sounds cool.”
Anthropic’s response has been to move away from flat-rate enterprise pricing and toward per-token billing, so the revenue it collects reflects actual usage. It has also cut off some third-party tools that were large consumers of tokens, while OpenAI has been making AI cheaper and easier to consume at scale.
Flat-rate pricing has dominated the early years of AI adoption, with fixed monthly fees for generous or unlimited AI access. That model worked when people were chatting with AI. But agentic usage turned what cost thousands of tokens per session into millions, and broke the economics.
Anthropic’s most generous consumer offering, its $200-a-month Max plan, became a case study.
Developers had been routing that subscription through third-party agentic tools like OpenClaw, running AI agents around the clock on a plan designed for conversation. Based on Anthropic’s published rates for its latest model, a heavy Claude Code Max user could be paying as little as $200 a month for usage that would’ve cost the user up to $5,000 without a subscription.
On April 4, Anthropic cut off those tools. Boris Cherny, head of Claude Code, wrote on X that the subscriptions “weren’t built for the usage patterns of these third-party tools.”
The same recalibration is happening in enterprise.
Older Anthropic contracts included standard and premium seats — flat monthly fees with a baked-in usage allowance. Those are now labeled “legacy seat types that are no longer available for new Enterprise contracts,” according to the company’s support page. New enterprise plans charge per seat, with token consumption billed at API rates on top.
Anthropic was first to move, but the pressure is building across the industry.
OpenAI’s Nick Turley, head of ChatGPT, acknowledged on a BG2 podcast that “it’s possible that in the current era, having an unlimited plan is like having an unlimited electricity plan. It just doesn’t make sense.”
If every token now carries a price, companies and consumers that budgeted for flat-rate AI are going to start asking what they actually got for it.
Ramp CEO Eric Glyman, who recently launched a token-tracking tool, sees the dynamic from the finance side.
AI spending across Ramp’s customer base has grown 13x over the past year and no one knows how to budget for it. He pointed to Anthropic’s approach as the more prudent long-term strategy, and raised a question that should concern OpenAI’s investors: if your business model depends on extracting maximum token spend, do you have the incentive to help customers use AI more efficiently?
Salesforce is making a similar bet, rolling out a new metric it calls “agentic work units” that tracks the work AI completes rather than the tokens it burns.
Both Anthropic and OpenAI are expected to pursue IPOs this year. When they do, the demand question will be the first thing public market investors try to answer.
Anthropic, by moving to per-token billing, will have cleaner data on what its customers actually value. OpenAI will have bigger numbers but a harder time proving how much of them are real.
If even a meaningful fraction of today’s AI demand is inflated, the company that priced for reality will be the one still standing when the correction arrives.
Discover more from InfoVera USA
Subscribe to get the latest posts sent to your email.