Practical guide · ROI

The AI works and the team uses it. But is it making you money?

It’s the final question, the CFO’s, and the most uncomfortable one. You picked the process, standardized it, built the agent, and got it adopted. Now the point: in euros, are you making money? Yes, there’s a serious way to answer. And almost nobody actually applies it.

Raffaele Zarrelli·Founder, Yempik·May 31, 2026·13 min read
In summary
  • Value moves along 3 axes: efficiency, quality, capacity. Measuring time alone sees only one.
  • Quality isn’t a bonus: it’s the gate. Time saved with quality dropping is not a gain.
  • Formula: ROI = (net benefit ÷ total cost of ownership) × 100. The hard part is what you put inside.
  • What counts is the total cost of ownership, not the license price: hidden costs decide the result.
  • To know whether the AI deserves the credit, you need a baseline and a control group, like an experiment.
The final question

"Nice. But are we making money?"

You’ve followed the whole path: you picked the right process, made it repeatable, built the agent on top of it, and made sure the team actually uses it. At this point comes the question that decides whether the project lives or dies: in euros, is it paying off?

And yet it’s the part almost nobody measures seriously. In 2026 only about a third of AI initiatives reach the expected return[4], and part of that failure isn’t economic, it’s about measurement: companies that are making money and can’t prove it, and companies burning cash convinced of the opposite. Measuring well is already half the advantage.

"It seems to work" is not a number. And an investment you can’t measure is an investment you can’t defend.

Before you measure

Time saved? That’s only a third of the story

When most people think about the return on AI, they think of one thing only: "how much time does it save me." It’s a real measure, but a partial one. The value of an AI system moves along three different axes, and before you pull out the calculator you need to know which one you’re after.

Efficiency

Same output, less time or cost?

How you measure it

Hours saved, cost per case, cycle time.

The agent prepares quotes in 5 minutes instead of 30, at the same quality.

Quality

Same time, better output?

How you measure it

Error rate, rework, complaints, consistency.

Two faces: internal (a CRM or an MVP you need to take to the next level) and market-facing (raising the service you offer to stay competitive).

Capacity

Things you simply couldn’t do before?

How you measure it

Volume handled, hours of coverage, peaks absorbed.

You answer 24/7 and handle the peaks without hiring anyone else.

The first question to ask is: with this project, do I want to do the same thing for less (efficiency), do it better (quality), or do new things (capacity)? If the goal was quality and you measure only the hours saved, you risk concluding that the project "doesn’t pay off" when it’s actually doing exactly what you wanted it for.

And "raising quality" itself has two faces, which show up constantly in real projects. There’s internal quality: you already have a system or a first draft and you need to take it to a higher level, "the CRM I use isn’t enough anymore," "this MVP needs to be made solid for production." And there’s market quality: you have to raise the service you offer to stay appealing; after years of doing things a certain way, you either update it or add a new one. In both cases the return isn’t read in the hours saved, but in the step-change in the output.

The condition that decides everything

Quality isn’t a bonus. It’s the gate.

Even when the goal is pure efficiency (same thing, less time), there’s a condition that decides whether that saving is real: quality has to stay at least the same. If you automate a process, save 200 hours a month, but the output gets worse, you haven’t saved anything. You’ve just pushed the cost further down the chain, where you don’t see it right away: rework, complaints, lost customers, reputational damage.

That’s why quality has to be treated as a gate, not as an extra: time saved only counts if you pass through it at equal (or improved) quality. And "quality" isn’t a feeling, it’s measured with precise numbers.

How you measure output quality
  • Rework rate: how often the output needs correcting before you can use it.
  • Error rate on a sample: take N random cases and have a person review them.
  • Escalation rate: how often the system has to hand off to a human.
  • Consistency: do two similar cases get similar answers, or does it depend?
  • Customer complaints and satisfaction, before and after.

The reference point is the starting quality, the level people reached before the AI. You set it as the baseline (it’s the same one that emerges when you standardize the process) and then check that the AI maintains or exceeds it. Only at that point does the time saved become a real gain, one you can put into the formula.

Time saved with quality dropping isn’t efficiency. It’s a debt you pay later, with interest.

The starting point

The formula is simple. The hard part is what you put inside

There’s a shared formula, and it’s the one CFOs have always used, applied to AI:[2]

ROI = (net benefit ÷ total cost) × 100

Net benefit = value generated − all costs. If the result is +120%, every euro invested has returned 1.20 to you on top of paying itself back.

It looks trivial, and indeed the problem isn’t the formula: it’s that most companies get the two ingredients wrong. They underestimate the costs (looking only at the license) and overestimate the benefits (taking credit that isn’t theirs). Let’s fix both, one at a time.

Mistake number one

The real cost isn’t the license price

It’s called the total cost of ownership (TCO): the sum of everything you spend over time to keep the project running, not just to build it. This is where half of all ROI calculations die: you count the initial development and forget everything else, the so-called "hidden technical debt."[2]

Initial development

Designing and building the agent.

Licenses and usagehidden

Subscriptions, tokens, API calls: they grow with use.

Integrationhidden

Connecting to your systems and existing data.

Training and adoptionhidden

The time it takes to get the team to actually use the tool.

Maintenancehidden

Updates, fixes, re-testing when something changes.

Initial productivity diphidden

The first few weeks are slower while everyone learns.

Almost everyone calculates the return by looking only at the first line. The five in orange are the hidden costs that decide whether the project actually pays off. Honest ROI puts them all on the books.

A concrete example: an agent built for €25,000 that nonetheless consumes €9,000 a year in licenses and APIs, plus the time of a person who maintains it, has a real first-year cost very different from "25,000." Ignoring it means telling yourself a ROI that doesn’t exist.

Mistake number two

"Is it thanks to the AI?" You need to be able to prove it

Sales went up. But is it because of the AI, the new salesperson, a recovering market, or the campaign that launched the same month? This is the attribution problem, and it’s why so many AI ROI figures are, in practice, opinions.[1]

The serious standard borrows from science, and you only need to understand two terms.

Baseline

What it was before, in numbers

A snapshot of the process before the AI: time, errors, costs. It’s the same baseline you set when you standardize the process. Without it, you have no "before" to compare the "after" against.

Control group

Those who keep using the old method

A team or a segment (a "holdout," usually 10%) that works as before. You compare them with those using the AI, over the same period: the difference is the real effect, cleaned of market and season.

Example: two similar sales teams. You give one the agent, the other stays as is. After three months, if the one with the AI closes 18% more and the other 4% (market effect), the AI’s contribution is the difference, 14 points, not the full 18. It’s a number you can take to a board without getting torn apart.

Without a control group, you aren’t measuring the return on the AI. You’re measuring how the company is doing and giving the AI the credit.

Try it now

An estimate of your return

To get a sense of the order of magnitude, start from the easiest axis to quantify: efficiency, that is, time saved. But note the quality factor in the tool: that’s where the estimate becomes honest. Move the values to fit your case and look at savings, return, and months to break even.

Tool · return estimate

How much it pays, in euros (at equal quality)

A starting estimate, to be confirmed with real measurement. Note the quality factor: it’s the one most calculations forget.

People on the process5

How many run it

Hours saved per week, each4 h

Time taken off manual work

Fully loaded hourly cost35

Gross salary + payroll costs, per hour

Project cost (one-off)25,000

Development, integration, training

Annual running cost9,000

Licenses, tokens, maintenance

And output quality, how did it change?

same quality as before: the savings are full

Net annual savings
36,400
First-year return
+7%
You break even in
11 months

Time saved only counts at equal quality. If quality drops, the savings must be cut by the cost of downstream errors (simplified here into a factor). This is a surface-level estimate: the real number comes from a measured baseline and a control group.

Expectations

How long before you judge

A mistake as common as the previous two: measuring too soon. Estimates point to an average payback of two to four years, and only a small share of projects pay off in under twelve months.[3] In the first few months there’s the productivity dip of the transition, the adoption that has to climb, the adjustments.

This doesn’t mean waiting two years in the dark. It means choosing the right metrics for each phase: in the first 90 days you watch adoption and the operational indicators (time, errors), and only afterward, once usage is stable, do you read the full economic return. Judging the final ROI at three months is like weighing a plant a week after sowing it.

ROI, after all, is the last link in a chain: it comes after picking the process well, standardizing it, and taking care of adoption. If one of these links breaks, the return doesn’t arrive, no matter how well you know how to measure it.

Want a ROI calculation done for your case?

In a call we define the baseline, the real costs, and how to isolate the AI’s contribution, so you bring a defensible number back to your company.

Book a call
FAQ

The questions we get asked most

How do you calculate the ROI of an AI project?

The base formula is: ROI = (net benefit ÷ total cost of ownership) × 100. The net benefit is the value generated (time saved, costs cut, extra revenue) minus all costs. The critical point isn’t the formula, but including the total cost of ownership (not just the initial development) and isolating how much of that value the AI is genuinely responsible for.

What is the total cost of ownership (TCO) of an AI project?

It’s the sum of all costs over time, not just the initial one. It covers development, licenses and usage (tokens, APIs, cloud), integration with existing systems, training and adoption, maintenance, security, and the productivity dip in the first few weeks. Calculating ROI on development cost alone is the most common mistake and inflates the result.

How do I know whether the gain is thanks to the AI and not something else?

You need two things: a baseline (what the process was like before, measured in numbers) and a control group or holdout (a team or segment that keeps using the old method). By comparing those who use the AI with those who don’t, over the same period, you isolate the real effect. Without a baseline and a control group, attribution is an opinion, not a measurement.

Is AI ROI measured only as time saved?

No, and that’s the most widespread mistake. The value of AI moves along three axes: efficiency (same output, less time or cost), quality (same time, better output), and capacity (things you couldn’t do before, like covering all 24 hours). Measuring only time saved sees one axis out of three, and on top of that it assumes quality stays the same. If quality drops, the time saved isn’t a gain: it’s a cost pushed further down the chain.

How do you measure the output quality of an AI system?

With concrete indicators, not gut feeling: rework rate (how often the output needs correcting), error rate on a sample reviewed by a person, escalation rate to a human, consistency across similar cases, and customer complaints. You set the starting quality (the human baseline) and check that the AI maintains or exceeds it, not just that it goes faster.

How long does an AI investment take to pay off?

Longer than people think. Estimates point to an average payback of two to four years, and only a small share of projects pay off in under a year. Measuring ROI at three months and declaring failure is a common mistake: you have to give the process time to be adopted and to produce stable effects.

Transparency note

I wrote this article myself. The method and the examples come from my work and from Yempik’s real projects. For the writing I had Claude Opus 4.8 help me with editing, clarity, and layout. The substance is mine; the tool is disclosed.

Transparency

Sources

  1. [1]CIO: measuring the true value of AI, baseline, control group, and attribution. www.cio.com
  2. [2]Workmate: the four components of AI ROI (value, TCO, calculation, benchmark). www.workmate.com
  3. [3]Shopify: how to calculate AI ROI; average payback of 2–4 years, only 6% under one year. www.shopify.com
  4. [4]Master of Code: in 2026 only about a third of AI initiatives reach the expected ROI. masterofcode.com