Token Economics Just Joined Your P&L — What That Does to SaaS Margins
When your product calls an LLM on every user interaction, inference cost stops being an R&D rounding error and starts being a recurring OPEX line that scales with usage. Most AI-native SaaS pricing pages haven't priced for the customer they're about to attract, and the gap is starting to show in margins.
A founder I work with sent me her latest unit-economics slide last week. The product is doing well — 60% gross margin on contract revenue, growing 12% net new ARR month over month, retention is healthy. The slide had a footnote that read: "excludes inference cost variance." When she included it, gross margin was 41% in her largest cohort. The variance was driven by a handful of power-users who had figured out a workflow that called the underlying model thirty times per session. She was paying for every call. They were paying her a flat per-seat fee.
This is not a tail-risk scenario. It is the new median condition for any SaaS product where the LLM call is in the hot path of user value, which is roughly every interesting product launched since 2024. The CFOs of these companies are starting to ask uncomfortable questions about why the cost of goods sold line keeps growing faster than revenue. The answer is the same answer every time and most pricing pages have not been updated to reflect it.
Token economics is now a P&L item, not a research note. The companies that internalize that early and reprice around it will compound a margin advantage. The companies that don't will discover, somewhere between $20M and $50M in ARR, that they built a SaaS shell on top of a usage-based cost structure and forgot to charge for the usage.
Why this snuck up on so many teams
The mismatch isn't accidental. There are structural reasons the cost line lagged the pricing line, and understanding them is how you avoid repeating them in your next pricing iteration.
Inference cost looked like an R&D problem in 2023. When your product called GPT-4 a few times per user per week, the cost was real but invisible — it disappeared into the AI/ML budget line and nobody had reason to question it. Pricing got designed off the assumption that this was a fixed-overhead bucket, not a variable cost that scaled with revenue. That assumption was always going to break the moment users started actually using the feature.
Per-seat pricing absorbed early usage variance. The economics worked at small scale because power users were rare. One user calling the model 30× per session was offset by ten users calling it twice. As the product proves out and customers ramp adoption, the distribution tightens — more users approach the power-user pattern, and the absorption math breaks.
Model price drops trained finance to assume costs would keep falling. From 2023 to 2025, token prices for frontier-class capability dropped roughly 95%. Finance teams baked continued declines into their gross-margin projections. Those projections worked for two cycles. In 2026 the price-drop curve flattened for the highest-capability tier and inverted slightly for long-context use, while customer usage patterns kept shifting toward exactly those use cases. The model-cost tailwind became a model-cost headwind.
Context window costs are the silent killer. A 200K-token context call is roughly 50× the cost of a 4K-token call at most pricing tiers. Products that started simple and added agentic / retrieval-augmented / multi-turn workflows quietly drifted into a context-cost regime that nobody planned for. The cost variance per user widened, and the unit economics started swinging by feature usage in ways the pricing model couldn't track.
How the cost structure actually scales
The classical SaaS unit economics assumed cost-to-serve was roughly fixed per account, dominated by hosting and a thin support overhead. Token economics breaks that assumption in specific, predictable ways.
Cost scales with engagement, not just headcount. A 50-seat customer that loves the product costs more to serve than a 500-seat customer that uses it lightly. This inverts the heuristic most CSMs and account managers use — "big account good, small account expendable." Under token economics, engaged small accounts can be your worst margin contributors.
Power users are now expensive users. In classical SaaS, your power users were your champions — high retention, expansion candidates, the people your case studies featured. Under token economics, they're often your worst-margin accounts. The case study still works. The unit economics don't.
Feature mix changes the per-account cost dramatically. A user who lives in the dashboard costs almost nothing. A user who runs the agentic workflow that fires twelve tool calls per task costs 40× as much. Same SKU, same seat, same revenue. Different P&L impact by an order of magnitude.
Geographic price variance compounds. US enterprise customers paying $200/seat absorb token costs invisibly. SMB customers in price-sensitive markets paying $30/seat for the same product can be negative-margin on token cost alone. Single-tier global pricing was already a stretch; under token economics it's a slow-bleeding mistake.
Latency optimization fights margin. Faster responses generally require either better models (more expensive) or more context priming (more tokens). The product team optimizes for the user experience metric they're measured on. The CFO optimizes for the cost line. Without a shared framework, these two pull the company in opposite directions and the user wins the short-term fight every time.
Where this is starting to show up in the wild
The companies that are already feeling this are mostly not talking about it publicly, which makes the pattern harder to learn from. What's visible in pricing pages, hiring posts, and earnings transcripts paints a clear picture if you know where to look.
Pricing page rewrites that introduce usage caps. A growing list of AI-native SaaS vendors quietly added "fair use limits" or "monthly action limits" to their plans in late 2025 and early 2026. These were not customer-friendly UX moves. They were margin-protection moves disguised as policy.
The reappearance of "credits." The credit-based pricing model that died with PaaS-era developer tools is back, now branded as "intelligent actions" or "agent runs." The credit unit obscures the underlying token math but lets the vendor reprice unilaterally when costs shift. Credits are a hedge against the volatility of inference cost.
Tiering by model class. Lower-priced plans get smaller, faster, cheaper models; higher-priced plans get frontier models. This used to be a feature differentiation. It's now a margin engineering choice — the cost-to-serve gap between tiers is much larger than the customer-perceived value gap, which is the right shape for healthy unit economics if you can hold it.
Quietly cutting unlimited. "Unlimited" was the marketing default for AI features through 2024. By 2026 it's nearly extinct on new pricing pages, surviving only at vendors whose use cases are inherently low-volume or whose pricing power is so high that they can absorb the variance.
What to actually do about it
There is no clean playbook because the underlying cost structure is still moving. There are some moves that compound regardless of which way the model-cost curve goes next.
Instrument cost-to-serve per account, weekly, from day one. The single most common pattern at companies that hit a margin wall is that nobody was watching cost-to-serve at account granularity until the wall hit. Build the dashboard now. Watch the P75 and P95 accounts, not just the average. The variance is the story.
Decouple your pricing unit from your cost unit deliberately. Per-seat pricing with token-cost backing is a structural mismatch. You can keep per-seat pricing — many customers strongly prefer it — but only if you've explicitly engineered a cost cap, action limit, or tier-down mechanism that absorbs the variance. Without that, you are running a flat-rate buffet against a metered kitchen.
Build the conversation about usage tiers before customers force it. Customers do not enjoy retroactive pricing changes. They tolerate proactive tiering ("here's what's in your plan, here's what triggers the next tier") if it's clear and consistent. The vendors who introduce structured tiers early avoid the painful PR cycle of "we have to charge more now."
Treat model-routing as a margin lever, not just a UX one. Most products call the same model on every request. The frontier model is overkill for 70% of requests and the cost difference is large. A small investment in model-routing logic — easy queries to cheap models, hard queries to frontier — typically recovers 30–50% of inference cost with no perceived quality drop. Most teams haven't done it because nobody owned it.
Renegotiate volume discounts at the inflection points. Anthropic, OpenAI, and the open-model providers all offer volume-based pricing. Most companies negotiate once and forget. The price curves shift every quarter. A standing quarterly review of model spend versus available discount tiers is now a finance function, not an R&D one.
The stakes — what changes if you get this wrong
The downside scenario isn't bankruptcy. It is something more annoying: a multi-year period of having to apologize to customers for repeated pricing changes that erode trust and slow expansion. Customers will tolerate one repricing if it's well-handled. They will not tolerate three. The companies that get token economics right early get to skip the painful conversations entirely.
Organizations that handle this well tend to have a single dashboard the CEO and CFO look at weekly that ties revenue, model cost, and account-level margin together. Organizations that handle it poorly tend to have three separate dashboards owned by three separate teams that disagree about the numbers, and they discover their margin problem in a board meeting six months after it started.
The deeper structural question is what token economics does to the long-run shape of SaaS as a category. Per-seat SaaS at high gross margins was a defining business model of the 2010s. It depended on a cost structure that was nearly flat with usage. AI-native products do not have that cost structure. Either pricing changes — toward usage, outcomes, or tiered consumption — or margins compress permanently to a software-services hybrid level. Both outcomes are happening simultaneously in different segments.
Reprice for the cost structure you actually have, not the one your pitch deck described last year. The teams that do this before the margin pressure becomes a board topic will look prescient. The teams that wait will spend 2027 explaining to investors why "transient inference cost variance" became a permanent line item.