You’re Measuring the Wrong Thing.
And Your AI Vendor Knows It.
I’ve sat in enough executive briefings to recognize the pattern. The slides go up, the containment numbers look strong, and the room feels good about the AI investment. What nobody mentions, and what I’ve learned to look for, is what’s happening to NPS on the same timeline. Or why escalation volume is climbing even as containment climbs with it. Or why the customers who “got resolved” aren’t coming back.
Containment rate has become the north star metric for AI in customer service. It’s easy to measure, it maps cleanly to cost reduction, and it gives leadership a number to present to the board. The problem is it doesn’t measure what it claims to measure. Containment rate tells you whether a customer was kept away from a human agent. It tells you nothing about whether their problem was actually solved.
That distinction sounds minor. It isn’t.
When the Metric Becomes the Target
There’s a principle in management theory that goes something like this: when a measure becomes a target, it ceases to be a good measure. It’s been true in manufacturing, in healthcare, in financial services. And it is absolutely true in AI-powered customer service right now.
When your AI deployment is evaluated, internally or by a vendor, on the basis of containment, the system optimizes to prevent escalation. Not to resolve issues. To prevent escalation. Those are not the same thing.
What that looks like in practice: customers who abandon a chat out of frustration get logged as successful containments. Phone numbers get harder to find. IVR loops get longer. A customer fails to get help from the chatbot, calls the support line anyway, and the chatbot interaction still counts as contained. The metric looks great. The customer is gone.
This isn’t a hypothetical. Data suggests that up to 20% of “contained” interactions are cases where the customer simply gave up. The system logged a win. The customer had a different experience.
The Outcome-Based Pricing Problem Nobody Is Talking About
As the industry started recognizing the flaws in traditional containment measurement, AI vendors moved toward outcome-based or per-resolution pricing. The pitch is intuitive: you only pay when the AI actually resolves something. Costs align with value. Everyone wins.
Here’s what actually happens.
When a vendor’s revenue depends entirely on maximizing billable resolutions, they have a structural incentive to define “resolution” as broadly as possible. And without an industry standard for what a resolution actually is, that definition tends to drift in the vendor’s favor.
Silence as satisfaction: counting a conversation as resolved because the customer stopped responding. No escalation equals success: billing for a resolution anytime a human agent didn’t officially take over, regardless of what the customer experienced. Click-based billing: charging a resolution every time the AI serves an article link and the customer clicks it, whether or not the article helped.
These aren’t edge cases. They’re models in active use.
There’s another dynamic that doesn’t get enough attention. Outcome-based pricing creates an all-or-nothing structure. If your AI handles 90% of a complex interaction (gathering order details, verifying identity, collecting documentation) but a human agent has to handle final authorization, the vendor often receives nothing. A partial win becomes a total financial loss for them.
That dynamic actively discourages the kind of human-AI collaboration that actually works well in complex service environments. Vendors are pushed toward full autonomous containment even when it’s the wrong call for the customer, because anything less doesn’t get paid.
What you end up with is resolution theater. The appearance of efficiency at the cost of the customer relationships you’re trying to protect.
The Cost Math Is About to Change
The economics of AI deflection have been built on a foundation that’s already shifting.
LLM vendors have been subsidizing their services, in some cases by as much as 90%, to capture market share. That window is closing. As these companies move toward profitability, compute costs will rise, and those costs will be passed through to enterprise customers.
Gartner is projecting that by 2030, the cost per resolution for generative AI will exceed $3, more expensive than many offshore human agents doing the same work today. Frontier models are consuming significantly more tokens per interaction than earlier generations, and the infrastructure costs behind them are real and growing.
The business case for AI in customer service was built on the assumption that deflection is cheap and getting cheaper. Neither of those things is going to be true for much longer, at least not if you keep building the way most teams are building today.
That last part matters. The Gartner projection assumes current architecture patterns: every engagement calling a frontier LLM, every interaction tokenized at full cost. It doesn’t have to work that way. RAG-based knowledge bases, smaller purpose-built SLMs capable of handling the majority of routine interactions, smarter routing that reserves LLM calls for the engagements that actually need them: these approaches can change the cost curve significantly. But that’s a blog for another day. For now, the point stands that organizations building AI strategy around the assumption of permanent cost deflection, without thinking carefully about architecture, are taking on more risk than they realize.
And separately: despite several years of aggressive AI deployment, only about 20% of customer service leaders have actually reduced agent headcount as a result. Gartner projects that by 2027, half of the companies that did cut staff due to AI will be forced to rehire to perform similar functions. The full-automation thesis is proving harder to execute than the pitch decks suggested.
This isn’t an argument against AI in customer service. It’s an argument against building your entire strategy around a cost assumption that has an expiration date.
The Part You Don’t See Until It’s Too Late
There’s a diagnostic test most teams never run, and it exposes the containment trap faster than anything else. Correlate your containment rate against your inbound human contact volume over the same period. If containment is climbing but inbound contacts aren’t dropping, the bot isn’t resolving anything. Customers are just routing around it. That’s detectable, and some organizations are starting to catch it.
But here’s the part that should keep CX leaders up at night: passing that test doesn’t mean you’re winning. It just means you’re not losing visibly yet.
If your containment rate goes up and your inbound human contacts do drop, the metrics look clean. Leadership feels good. The AI investment is working. What the dashboard doesn’t show you is the customers who got “resolved,” had a friction-filled experience, quietly decided something about your brand, and didn’t come back. No escalation. No complaint ticket. No signal. Just a gradual erosion that shows up in churn data or renewal rates months later and gets blamed on pricing or the competitive environment.
This is the nature of CX signals. By the time they appear in your data, the decisions that caused them were made weeks or months ago, in interactions your system logged as successful. Churn, NPS decline, reduced repeat purchase rates: these are all lagging indicators of something that already happened. The customer didn’t file a complaint. They just left.
Gartner’s research on customer effort puts a number to this. 96% of customers who experience high-effort interactions become disloyal, compared to 9% of those with low-effort experiences. Contained does not mean low-effort. A customer can be fully contained and have a miserable time, and your metrics will never reflect it.
That is the blind spot. And it compounds quietly until the signals arrive and it’s too late to recover the relationship.
So What Should You Actually Do?
The answer isn’t to abandon AI or throw out containment data entirely. It’s to stop treating any single metric as a proxy for customer success, and build the measurement architecture that catches what containment misses.
Start with multi-dimensional KPIs, not a single north star. Containment has a place in the picture, but only when it’s correlated with metrics that reveal what’s happening downstream. If containment is an outcome you’re tracking or paying for, tie it to inbound human contact volume. If containment is rising and human contacts aren’t falling proportionally, you have bad containment on your hands. If both are falling, go deeper: tie it to NPS, to CSAT, to repeat contact rate, to renewal and retention data. A metric that looks healthy in isolation can look very different when you put it next to the customer behavior it was supposed to influence.
If you’ve structured outcome-based pricing with your AI vendor, get the protections right before the contract is signed, not after. Agree in writing on how a resolution is defined. Insist on independent measurement rather than the vendor’s own reporting. Build in attribution methodology that accounts for channel switching: if a customer is “resolved” by the bot and calls the support line within 24 hours, that is not a successful resolution, and your contract should say so explicitly. Require your vendor to share the data that would expose gaming, not just the data that makes them look good.
Consider what it would mean to add customer effort to your scorecard alongside containment. Customer Effort Score measures something containment completely ignores: how hard the customer had to work to get help. You can have a high containment rate and a terrible effort score simultaneously, and the effort score is the one that predicts churn.
Think about what you’re measuring on the human side of the equation as well. AI’s strongest current use cases are often in augmentation: triage, summarization, intent classification, pre-interaction context assembly. These make human agents faster and more effective. Measuring how well AI performs these functions, not just how often it replaces a human entirely, gives you a much clearer picture of where your AI investment is actually delivering value.
And build a forward-looking view into your measurement model. Lagging indicators will always tell you what happened. What you need is a set of early signals: repeat contact rate within a resolution window, deflection-to-call correlation, post-interaction survey completion rates, NPS trend by channel. These won’t eliminate the blind spot entirely, but they shrink the gap between when the customer experience degrades and when you find out about it.
The Reframe
The contact center isn’t a wall between customers and answers. It never should have been. The companies still building strategy around containment numbers are optimizing for the wrong outcome, and the compounding costs of that are real: eroded loyalty, customers who go silent rather than complain, and the growing gap between what the metrics show and what’s actually happening.
The AI vendors who benefit from the current model are not going to tell you this. That’s worth keeping in mind when you’re reviewing their reporting.
There is a version of AI-powered customer service that drives revenue, deepens loyalty, and earns the kind of preference that actually shows up in lifetime value. It requires measuring what customers experience, not just what the system logged.
The companies that make that shift now are going to look very different from the ones celebrating containment numbers in 2027.
Gabe Rivero is the founder of EvolveCX, a vendor-neutral CX transformation consultancy. He brings 25 years of enterprise CX operator experience at VP to C-level roles.