Can you measure ROI on AI coding tools?

Not directly, until you measure engineering output itself. The fix is to score the value shipped through code with a unit like ETV, split it into growth, maintenance, and fixes, tie it to the roadmap, and put a price on it using token spend.

What does cost per ETV mean?

If a team delivers 30 ETV in a month and spends $3,000 on AI tokens for coding, testing, and review, that is $100 per ETV shipped to production: a measure of how efficiently spend turns into delivered output.

Does shipping faster with AI hurt code quality?

It should not. ETV is split into growth, maintenance, and fixes, and healthy AI adoption keeps that split steady. If fixes and maintenance balloon, you are paying the tools to clean up after the tools.

Blog

Finding the Link Uber Missed: How to Connect AI Code Spend to Business Value

Jirka BachelMay 20, 20263 min read

Uber burned through its entire 2026 AI coding budget in four months.

When its president and COO Andrew Macdonald was asked whether all that Claude Code spend was producing better features for riders and drivers, he couldn’t point to one. His words: “That link is not there yet.” More tokens, more spend, and no stat that proves Uber shipped anything more useful for the money.

https://www.forbes.com/sites/janakirammsv/2026/05/17/uber-burns-its-2026-ai-budget-in-four-months-on-claude-code/

He’s not the only one acting on it. Microsoft is pulling most internal Claude Code licenses across its Experiences + Devices group and moving engineers to GitHub Copilot CLI by the end of June. So “how do we measure ROI on AI coding tools” is now the loudest question in engineering.

Here’s the part nobody wants to say out loud. We can’t measure ROI on AI tools because we never measured ROI on engineering. As a former CTO, I got tired of vibe-based reporting on the gains we’re all told to expect.

We know one thing for sure: engineering should cost less than the revenue it makes possible. That’s the whole model. If we knew how to measure engineering itself, we’d have a KPI for it already. We don’t.

So when the entire pitch for AI tools is “make engineering faster,” the obvious question is: faster at what? Coding? Reviewing? Committing? Lines generated? None of that reaches a customer.

What you build matters more than how fast you build it. I agree with that. It stops being the whole answer the moment you already know what to build and you know it will make money. Then delivery speed matters a lot. And if you build the wrong thing, building it faster still helps, because you learn faster and reach the right thing sooner.

The prize is roadmap delivery: the rate at which your team ships the roadmap into production. That’s where AI tools should earn their cost.

Which drops us right back into the measurement problem. Story points? Velocity was a contested metric years ago, and it’s worse now. When I’m one prompt away from a working feature in Claude Code, what exactly am I estimating?

You estimate the thing that’s actually real: how difficult and complex it was to land each feature or fix in your codebase. You do that for every commit, across the team. Now you can compare year over year and say something true, like “we deliver complex code 50% faster than we did last year.”

If you care about token spend, you can put a price on that gain. At Navigara we measure this as Engineering Throughput Value, or ETV. Performance gets scored through the lens of a senior engineer, reading the story of every PR with LLMs, ML, and algorithms.

A worked example. Last year a 3-person team delivered 20 ETV in a month. This year the same team delivers 30. That month they spend $3,000 on tokens for coding, testing, and review. Each unit of delivered work costs $3,000 / 30, or $100 per ETV shipped to production. Now you know how efficiently the team turns token spend into output.

Then the real worry: faster at producing what? Bugs? AI slop?

Quality still matters, and my view on it changed. The bar is code that works and stays maintainable. So ETV gets split three ways: growth (new value), maintenance, and fixes. The test is simple. Adopting AI tools shouldn’t change the split. If your team is excellent and your testing is tight, fixes should even tick down.

Now you know two things. The team ships faster, and it isn’t drowning in slop.

One question left, and it catches everybody. Did anyone actually want what got shipped? Or did the team finally build the pet projects they never had time for?

So you tie ETV back to the roadmap. Now you can say something a CEO will sit up for: 75% of the team’s output last quarter was roadmap work, and shipping it faster pulls revenue forward.

That’s how you measure faster roadmap delivery with AI tooling. Faster roadmap, more revenue, and a real answer when someone asks what the AI spend bought: the link Uber couldn’t find.

Frequently asked questions

Can you measure ROI on AI coding tools?: Not directly, until you measure engineering output itself. The fix is to score the value shipped through code with a unit like ETV, split it into growth, maintenance, and fixes, tie it to the roadmap, and put a price on it using token spend.
What does cost per ETV mean?: If a team delivers 30 ETV in a month and spends $3,000 on AI tokens for coding, testing, and review, that is $100 per ETV shipped to production: a measure of how efficiently spend turns into delivered output.
Does shipping faster with AI hurt code quality?: It should not. ETV is split into growth, maintenance, and fixes, and healthy AI adoption keeps that split steady. If fixes and maintenance balloon, you are paying the tools to clean up after the tools.

Finding the Link Uber Missed: How to Connect AI Code Spend to Business Value

Frequently asked questions

More from the blog

Three Ways to Manage Token Spend. Two of Them Backfire.

Is My Engineering Good Enough?