The Five Failure Modes of Legacy Modernization

The Five Failure Modes of Legacy Modernization

A systems-engineering failure, not a software one

A Skara Brae Systems field paper · by David Green · Draft v1 for review


Executive summary

Modernization projects overrun budget and schedule at a rate that should embarrass the industry. The usual postmortem blames the technology, the cloud platform, the framework, the vendor. The blame is misplaced.

After two decades modernizing legacy systems in insurance, utilities, healthcare, and cloud infrastructure, I’ve watched these projects fail the same five ways. Almost none of them are about whether the team can code.

That’s the tell. This isn’t a software-engineering problem; the code usually works. It’s a systems-engineering problem. Software engineering builds the parts well. Systems engineering governs the whole: the real requirements, the data, the interfaces, the validation, and the hand-off into live operation. Every failure in this paper lives in that second discipline, and it’s the one teams treat as someone else’s job.

Underneath it is a boundary error. Teams draw the line around “the system” too small. They think the system is the software. But the real system is the software plus the data, plus the business rules (most of them written down nowhere), plus the people who carry the institutional knowledge, plus the organization that operates it, plus the feedback between all of them. Modernize the software and leave the rest unaccounted for, and you haven’t modernized the system. You’ve rebuilt one corner of it and hoped.

Five failure modes. One root. And they chain.

Technology is rarely the cause

Here’s the uncomfortable part for technologists: the hard problems in legacy modernization are not technical. They’re problems of knowledge, agreement, and operation. The code is usually the easy part.

This is a systems problem, in the full sense of the word. A county runs on systems. And modernization fails for lack of systems thinking. Same word, doing double duty, and the pun is doing real work.

Systems thinking is the mindset; systems engineering is the discipline that puts it to work across the whole life of the system: what it truly must do, how it’s validated, how it goes live, and who keeps it running. The five failures below are five places that discipline gets skipped.

A system isn’t a pile of parts. It’s a set of interconnected pieces whose behavior comes from how they relate, not just from the pieces themselves. The legacy system you’ve been hired to replace is software, yes. But it’s also twenty years of business rules, the people who remember why those rules exist, the data that encodes them, the org that runs the thing, and the loops of feedback that keep it honest. Replace the software and leave the rest unaccounted for, and you haven’t modernized the system. You’ve rebuilt one corner of it and hoped.

The five failure modes below are five places teams consistently mis-draw that boundary. The pattern is reliable enough to plan around.


Mode 1: The knowledge isn’t documented. It’s gone.

On a mainframe modernization at a large insurer, the engineers who’d written the original system were long gone. The institutional knowledge had retired with them. There was no one left to ask why the code did what it did.

So the team did the only thing it could: it reverse-engineered the requirements from the behavior. The old system does X, so the new one must do X, identically. We called it feature matching. It has a less flattering name: bug-for-bug compatibility.

It reproduces the outputs. But it preserves the what and permanently loses the why. And when you can’t tell an intended rule from a twenty-year-old accident, you faithfully rebuild both. You carry the bugs and the dead rules forward as if they were features. Worse, the system you ship is now just as much a black box as the one you replaced. You’ve modernized the technology and recreated the original problem.

The systems error: the team counted the code as the system. But the knowledge was part of the system too, and by the time anyone went looking, the code was its only surviving copy.

Mode 2: Acceptance arrives too late to matter.

On a $30 million build for one of the largest utilities in the country, requirements were muddy at the outset and acceptance was saved for the end. Big-bang UAT. The plan was: build for months, then test.

Here’s what that plan guarantees. Feedback gets wired to the worst possible moment, when the cost of change is highest and the schedule has no slack left to absorb it. Problems surfaced in UAT. Fixing them pushed the date. New problems surfaced. The date moved again. The schedule slipped so far the project hit late-delivery penalties. The team paid for it with a four-month death march.

And because the requirements had been muddy up front, UAT wasn’t really testing. It was the first time anyone compared what was built to what was wanted. That’s not acceptance. That’s discovering the requirements at the most expensive moment available.

The systems error: there was no control loop between the builders and the people whose mental model defined “correct.” A system with its only sensor at the finish line cannot steer. By the time it reads the error, it’s already off the cliff.

Mode 3: The data is the project.

A healthcare-insurance data platform needed to move to Snowflake from an older database. On paper, a data conversion. Scoped like one too: a few months of work.

It was nothing of the kind. There was no clear, documented place that held the rules. The code and the data were self-documenting, which is a polite way of saying the only specification was the data itself and the transformations that shaped it. Understanding it required the patience of a forensic investigator. And the rules weren’t in one place. They were spread across a multi-stage ETL pipeline, and each stage was owned by a different team, each fluent in its own stage and blind to the rest. No one held the pipeline end to end.

The work wasn’t a conversion. It was a multi-year reconstruction of business logic the organization had quietly scattered across its own boundaries. After two-plus years, it still wasn’t finished.

The systems error: the data was system state, twenty years of accreted rules and exceptions, not a side task. And the knowledge of how it fit together had decayed into the seams between teams, where nobody was looking. (This is Mode 1 again, one layer down: the institutional memory had dissolved into the artifacts.)

Mode 4: Nobody agreed what “done” means.

Back to that utility build. The contract was signed before anyone understood the actual work. Scope moved constantly, and there was no process to absorb the movement, so every change landed as an argument instead of a decision. “Done” was never something everyone had agreed on, which meant it was never something the project could reach.

There’s a particular version of this worth naming, because it’s so common: sometimes the misaligned stakeholder is the technologists themselves. On that project, the architecture was driven by a nascent, largely unproven, fashionable stack, chosen because it pleased the people choosing it, not because it fit the risk the project could bear. Good engineers sink projects this way constantly, optimizing for the system they want to build over the system the client needs. (That pattern deserves its own treatment, and it’ll get one in a companion note.)

The systems error: the organization is part of the system, and a system with no shared model of success and no governed way to change has no way to converge. It just drifts.

Mode 5: It works. It isn’t ready to run.

I built and delivered a modernization of the legacy tooling for a cloud infrastructure vendor’s performance-management suite. I’ll own this one plainly: I signed off on tools that worked great on my machine. Real user testing was thin to nonexistent.

They broke in the field. Mostly they broke in environments other than mine. The tools ran across a portfolio of different clouds and configurations, and the assumptions I hadn’t known I’d baked in didn’t survive the move. To a lesser extent, the tools didn’t fit the way people actually worked.

“It runs here” was never the same claim as “it runs there.” And “it works” was never the same claim as “it’s ready to run.” Only the first claim, in each pair, ever got checked.

The systems error: the real system was the software in operation, in users’ hands, in their environments. That version was never the thing that got tested. I accepted my own work, and there was no one whose job was to ask whether it survived contact with reality.

Notice that Mode 5 is Mode 2 resurfacing at the operational layer. The acceptance gap that bites in UAT is the same gap that bites at go-live. Nobody validated against reality. It just shows up twice.


Why the usual fixes miss

Look at where the industry spends its energy: Agile, cloud, microservices, AI. Every one of those is an implementation technique. None of them is a modernization strategy.

They make building faster. They do nothing for the five failures above, because those failures aren’t about building. Agile won’t recover knowledge that retired three years ago. The cloud won’t reconcile twenty years of undocumented data. A microservices diagram won’t make stakeholders agree on what “done” means. AI will help you read the legacy code faster, but it will tell you what the code does, never whether the business still needs it, and it will replicate a bug as cheerfully as a feature unless a human decides which is which.

These tools optimize the one part of the work that was already the easy part. The hard parts (knowledge, agreement, data, operation) go untouched, which is exactly why the project still fails with a modern stack.

What to do instead

The fix isn’t a better technology. It’s drawing the system boundary correctly and governing the whole thing. In practice that means a handful of disciplines, held from the first day to the last:

You’ll notice none of this is exotic. It’s just unglamorous, and it lives in the parts of the project technologists find least interesting. That’s precisely why it gets skipped, and precisely why the skipping is so predictable.

Conclusion

Legacy modernization projects don’t fail because the team couldn’t write the code. They fail because the system was always bigger than the software, and almost no one drew the boundary that wide. Put plainly: they fail as systems engineering, not as software engineering.

The five failure modes are five versions of the same mistake, and they compound. Lost knowledge makes acceptance harder. Muddy agreement makes late acceptance lethal. Undocumented data hides in the seams. None of them stays in its lane, because that’s what failures do in a system.

The teams that succeed aren’t the ones with the trendiest stack. They’re the ones who treated modernization as systems engineering: they saw the whole system, governed it across its whole life, and never mistook “the software works” for “the system is done.” Some things are built to last five thousand years. They were built by people who understood what they were building, all of it.


Skara Brae Systems modernizes legacy government systems while preserving the knowledge buried inside them. This is the first in a series of field papers on why modernization goes wrong, and what to do about it.