No. 15 · Policy & People

Measuring Failure and Success

Why the right measure of an AI program is capability built, not dollars saved.

Abstract. Most organizations adopting AI measure success in dollars saved. Alberta measures it differently. The goal is not to do the same work with fewer people, which would be folly, but to build capability and a workforce that can transform how government operates, at a moment when Canada's productivity is under strain and the normalization of AI in the workplace looks inevitable. Over eighteen months the province has been recognized among the most innovative public-sector adopters in North America. This paper sets out the three measures Alberta uses to judge whether that work is succeeding: readiness, system health, and cost, counted across the whole of government rather than one ledger.
Over the last eighteen months, Alberta has been recognized as among the most innovative public-sector adopters of AI in North America, through the Academy, the agentic workforce, and the AI factory described across this collection. The work has a purpose beyond efficiency. Canada's productivity has been slipping, the normalization of AI in the workplace now looks inevitable, and the province intends to keep its workers relevant and its institutions capable. Moving at that speed invites a fair question, one we are asked often: how do you know whether you have succeeded? This paper answers in three measures.


## §01 The wrong yardstick

It is tempting to measure an AI program the way most organizations do, by the money it saves. That measure is too small, and on its own it misleads. Continuing to do exactly the work we do today with fewer people would be folly: it banks a one-time saving and forfeits the larger prize, a public service that can do more, and do things it could not do before. Alberta's aim is to build capability and a workforce equal to the transformation of government, and the measures that follow judge progress against that aim. They are readiness, system health, and cost, and each was chosen because it is hard to argue with.


## §02 Readiness

Readiness asks whether the organization can carry this change, and it has four parts. The first is education. A workforce that understands the fundamentals, the concepts, and the limits of these technologies can adapt to whatever comes next; one that does not will be dependent and brittle. The AI Academy is the vehicle, and the measure is the organization's readiness and adaptability to change.

The second part is control of both sides of the agentic equation: the contracts, controls, and security to build new applications safely, and the defenses to withstand the bad actors who now wield the same tools, a balance the Cyber Imperative takes up in full. The third is self-sufficiency. As adoption grows, are we becoming stronger and more able to lead this work ourselves, or weaker and more dependent on third-party vendors, our control and agency shrinking in proportion to our use of the technology? The fourth is governance: whether the rules in place give concerned stakeholders genuine confidence in how AI is being used. A program can move fast and still fail any one of these four, which is why readiness is measured on its own.


## §03 System health

System health turns the program toward the problem named in the first paper of this collection, the Ship of Theseus: an estate aging out faster than it can be repaired. The measures here are blunt and hard to dispute. Are cyber vulnerability exposures going down? Is the backlog of roughly six hundred applications shrinking? Is the total number of applications falling as systems are consolidated? Are we, in short, reversing the tide rather than holding against it?

These metrics matter because they leave little room for argument. They frame the elimination of technical debt in numbers that rise or fall, and a program that is working will move them in one direction over time. A program that is not will show it plainly. There is no narrative to hide behind when the backlog is a single number reported year over year.

The backlog ~600 apps. Roughly six hundred applications wait in the modernization backlog. Whether that number falls, year over year, is one of the clearest signs of whether the program is working.


## §04 Cost, properly counted

Cost is real, and it has more than one lens. The first is our direct cost as the government's IT provider. The factory delivers faster, but if delivering the same product costs more than before, we have traded one problem for another, the way the market has moved customers from licensed software to SaaS to AI, each migration a fresh cost category. Speed that arrives with unacceptable new cost is not a win.

The second lens is wider. Much of a ministry's performance is shaped, for better or worse, by the IT beneath it, so the question becomes whether better systems let ministries deliver better programs. AI reaches beyond technology, into program delivery, client engagement, policy development, and benchmarking. The honest accounting is therefore whole-of-government, not one ministry's ledger.

The widest lens is the public service itself. Over three to five years, will its size and cost, measured against the population it serves, stay flat or fall? Growth would be hard to imagine here. The minister's direction is to do more for less, and the expectation is that Alberta's public service stays flat or declines relative to its population while delivering more, with AI making up the difference.

The mandate Do more for less. The test for cost, set by the minister: the public service should stay flat or shrink against the population it serves, while delivering more.

"Continuing the same work we are doing, just with fewer staff, is folly." · Janak Alford, Deputy Minister, Ministry of Technology and Innovation


## §05 The standard, and sharing it

Taken together, the three measures describe what success looks like: a capable and adaptable workforce, an estate getting healthier rather than sicker, and a public service that does more for less across the whole of government. Failure is equally legible. A workforce growing more dependent, a backlog that will not fall, costs merely shifted from one category to another. Naming both matters, because a measure that cannot show failure cannot show success either.

Alberta will keep publishing these results as they come, in the same transparent and accessible form as the rest of this collection, so that any government can adopt what works and learn from what does not. The aim is a government capable of more, built deliberately and judged honestly against measures that can tell the difference.

Tags: measurement, success-criteria, readiness, system-health, cost, workforce, change-management

Open the interactive version