Why IT goes wrong
This is the third in a short series of commentaries that
look at when, where, and why IT fails in its support for
business operations and objectives.
In the first commentary we looked at thirteen incidents
that severely disrupted organisation, their customers, and
the public. In the second commentary we identified (from
afar) the processes that were most likely to have broken
down to cause the incidents.
In this commentary we ask why those processes failed. We
were not close to most of the incidents, so we need to
speculate a little on the root causes. However, we have
been around this IT industry for a while and have seen the
same mistakes being made and the same processes breaking
down year after year, decade after decade. In most cases
the technologies are irrelevant – we have seen the same
problems occur in the mainframe world, the minicomputer
world, the client-server world, and the dot com world.
Before we discuss some of the root causes, let’s recap the
examples:
• A pharmaceutical company wrote off nearly $17.2 million
in missing funds due to IT "discrepancies". A short time
later the CEO and CFO were replaced. The pharmaceutical
company’s accounting and IT auditors should have been able
to pick up IT discrepancies – whether they were caused by a
project or “evolved” during day to day operations. The
process failure? Risk management.
• A drug company was forced into bankruptcy by a series of
operational and project blunders. The drug company’s
business strategists and CIO were responsible for taking
high risks in a fragile business environment. The process
failure? Business/IT strategy, enterprise risk management.
• A late, over budget system was introduced to “streamline
and simplify” importers’ dealings with a government agency.
Within days there was a severe backlog of containers in
seaports during a critical period for importers in the
lead-up to a holiday season. The imports system evolved
into an IT implementation for the government agency – the
agency’s clients (the importers) appear to have been
consulted insincerely and their very real concerns ignored.
The process failure? Business/IT planning, project
management, stakeholder engagement.
• A telco spent $500m+ on billing software. It is still not
right, and the same telco has announced a replacement
programme. The software made the front pages when it sent
erroneous final notices to the relatives of long-dead
customers. Perhaps poor requirements management but
certainly poor vendor management. The process failure?
Project management.
• A large bank was pushed by a software vendor into early
adoption of an untested new version; the software took out
the automated teller machines, then allowed cardholders to
withdraw cash without debiting their accounts. Other banks
chose to test the software more thoroughly and detected the
bug. The process failure? IT architecture, project
management, capability management, and operations.
• A government agency applied its annual regulatory changes
to old and unstable core systems. The systems first
overcharged members of the public, then made too many
refunds, then overcharged those who received incorrect
refunds, and finally got it right on the fourth attempt.
The government agency’s regulatory changes were applied at
short notice to applications that were known to be old,
poorly maintained, and fragile. Senior business executives
had blocked funding requests for major application upgrades
over the previous ten years, but still insisted on very
short lead-time changes. The process failure? Business
strategy, business/IT planning, capability management.
• A major new stock exchange system was 11 years late and
13,200% over budget. The process failure? Probably in all
areas, but the responsibility must lie with the Board of
Directors for allowing this debacle to drag on for so long.
• A new emergency services system was introduced on time
and on budget, but the system and its backup locked. The
emergency service (in one of the world’s largest cities)
reverted to a manual system that restricted the ability of
the service to respond quickly and as a result placed lives
at risk. (This debacle was repeated by an emergency service
in another large city on the other side of the world only
months later.) The process failure? IT architecture,
project management, capability management, IT operations,
risk management.
• An IT infrastructure upgrade increased in cost by 150%
(from about $US 2Bn to about $US 5Bn – only 18 months into
a ten year project. The process failure? This was an
extended comedy of errors, with apparently little business
leadership, little risk management, little process control,
technology-led IT planning, unfettered, demand-driven
requirements, unskilled negotiators, uncontrolled vendors,
and no escape clauses.
• The opening of a new airport was delayed 16 months by
late delivery of revolutionary software. As a result the
airport's planners’ bond rating was demoted to junk and the
organisation lost $1.1 million a day in interest and
operating costs. The airport software was revolutionary,
but posed a high business risk in the circumstances. The
process failure? The initial business direction does not
appear to have been constrained by intelligent risk
management.
• Most of the desktop computers in a government welfare
agency were paralysed for four days when a failed operating
system upgrade took them offline. The outage, covering 75
percent to 80 percent of the agency’s 80,000 PCs, was one
of the largest in the country’s history. The outage
disconnected staff e-mail, benefits processing, and
connectivity to critical information and systems. The
welfare agency’s desktop computer failure was caused by an
unintended release of unready infrastructure software. The
process failure? This occurred in the “escrow” capability
management zone that exists between the project and IT
operations. It was a defect in the quality process (in an
organisation that had purportedly achieved some level of
quality certification).
• In the same country a “computer crash” in another
organisation prevented pensioners from collecting benefits
payments. The computer crash that prevented benefit
payments was caused by distribution of software onto a
platform that had not been updated to the minimum platform
requirements. The process failure? Project management,
capability (configuration) management, and IT operations.
• In yet another organisation in the same country a call
centre and its systems for processing applications for
welfare payments ran so slowly that “up to two thirds” of
callers (in at least one region) were unable to get
through, and there is evidence that once through, payments
took up to six weeks after applications were lodged.
(Presumably, if people are needy they actually do need the
payments as soon as possible!). The slow call centres were
caused by the same problem – updated software that had not
been fully tested on all the configurations that were in
use across an agency with hundreds of branches and call
centres. The process failure? As above, project management,
capability (configuration) management, and IT operations.
If you look carefully at these incidents you will notice
that the technology components themselves (the
applications, networks and infrastructure) were relatively
reliable. The problems in most of these very high profile
cases occurred in the governance, risk management, and IT
management processes. Even where the problem surfaced in
technology failure, the management processes can be seen
clearly as the root causes.
Why is it so?
Surely after 40 or so years of theories around general
management and IT management, countless management fads,
several generations of automated management tools
(especially those classic hyperboles – so common in our
industry – management information and business
intelligence) we shouldn’t be making the same mistakes that
we did 30 and 40 years ago!
For the most part, the technologies - and technologists –
themselves are good quality and can produce quality results
when applied to a clearly defined problem.
What’s the problem, then?
We believe the answer lies in three common behaviours that
in themselves have nothing to do with management, or
technology, or management of technology:
• Laziness,
• Arrogance, and
• Greed
Before you say “yah, just another nutcase having a rant and
wasting my time”, have a look at what happened in those 13
disasters:
• A pharmaceutical company wrote off nearly $17.2 million
in missing funds due to IT "discrepancies". A short time
later the CEO and CFO were replaced. The pharmaceutical
company’s accounting and IT auditors should have been able
to pick up IT discrepancies – whether they were caused by a
project or “evolved” during day to day operations. The
process failure? Risk management. Were the accountants and
auditors too lazy (or too interested in fees) to inspect
the systems properly?
• A drug company was forced into bankruptcy by a series of
operational and project blunders. The drug company’s
business strategists and CIO were responsible for taking
high risks in a fragile business environment. The process
failure? Business/IT strategy, enterprise risk management.
What greedy, arrogant strategist embarked on this high risk
adventure? Was the CIO too lazy or too greedy for options
to stop the projects?
• A late, over budget system was introduced to “streamline
and simplify” importers’ dealings with a government agency.
Within days there was a severe backlog of containers in
seaports during a critical period for importers in the
lead-up to a holiday season. The imports system evolved
into an IT implementation for the government agency – the
agency’s clients (the importers) appear to have been
consulted insincerely and their very real concerns ignored.
The process failure? Business/IT planning, project
management, stakeholder engagement. Wow – what were these
people on? Too lazy or too arrogant to consult their
stakeholders? Too arrogant to consider the impacts of a
half-baked system? Too interested in protecting their
public sector pension schemes?
• A telco spent $500m+ on billing software. It is still not
right, and the same telco has announced a replacement
programme. The software made the front pages when it sent
erroneous final notices to the relatives of long-dead
customers. Perhaps poor requirements management but
certainly poor vendor management. The process failure?
Project management. In this case there were vendors
involved. Were they more interested in taking orders for
more billable time than looking critically at what was
being asked? Where were the senior Telco managers when the
project went many times over budget? Too lazy to get
involved? More interested in protecting their pensions?
• A large bank was pushed by a software vendor into early
adoption of an untested new version; the software took out
the automated teller machines, then allowed cardholders to
withdraw cash without debiting their accounts. Other banks
chose to test the software more thoroughly and detected the
bug. The process failure? IT architecture, project
management, capability management, and operations. We know
this was laziness on the part of the systems programmers,
and greed on the part of the vendor’s systems engineers
(who were paid bonuses for selling the upgrade – but only
after it was installed, regardless of whether it was
successful)
• A government agency applied its annual regulatory changes
to old and unstable core systems. The systems first
overcharged members of the public, then made too many
refunds, then overcharged those who received incorrect
refunds, and finally got it right on the fourth attempt.
The government agency’s regulatory changes were applied at
short notice to applications that were known to be old,
poorly maintained, and fragile. Senior business executives
had blocked funding requests for major application upgrades
over the previous ten years, but still insisted on very
short lead-time changes. The process failure? Business
strategy, business/IT planning, capability management. This
was arrogance on the part of the IT practitioners who had
done it all before, and who ignored warning signs that were
apparent the previous year. They were also too lazy (and
perhaps too interested in their government pensions) to
inform the head of the agency – in plain language – what
would happen if the core systems were not refreshed.
Instead it was left to external advisors to be the bearers
of bad news, and a couple of years later the culprits
retired with huge pensions after gaining promotions.
• A major new stock exchange system was 11 years late and
13,200% over budget. The process failure? Probably in all
areas, but the responsibility must lie with the Board of
Directors for allowing this debacle to drag on for so long.
If we focus on the Board (the rot probably extended
throught the organisation) they were probably too lazy to
get involved in a project, and too interested in their
emoluments to ask for information (in most western
jurisdictions, once a director – whether executive director
or non-executive director - is informed, he/she is legally
required to act on the information).
• A new emergency services system was introduced on time
and on budget, but the system and its backup locked. The
emergency service (in one of the world’s largest cities)
reverted to a manual system that restricted the ability of
the service to respond quickly and as a result placed lives
at risk. (This debacle was repeated by an emergency service
in another large city on the other side of the world only
months later.) The process failure? IT architecture,
project management, capability management, IT operations,
risk management. We know about only one of these. The
emergency services staff were not prepared to make
decisions – it was (and could still be) part of the culture
to avoid decisions and thereby avoid accountability if
something bad happened. Laziness? Greed?
• An IT infrastructure upgrade increased in cost by 150%
(from about $US 2Bn to about $US 5Bn – only 18 months into
a ten year project. The process failure? This was an
extended comedy of errors, with apparently little business
leadership, little risk management, little process control,
technology-led IT planning, unfettered, demand-driven
requirements, unskilled negotiators, uncontrolled vendors,
and no escape clauses. We’re not sure about this one, but
certainly there were signs that decisions were avoided
(thereby avoiding any threat to those pensions), and there
were indications of vendor greed – jacking up prices on
commodity components.
• The opening of a new airport was delayed 16 months by
late delivery of revolutionary software. As a result the
airport's planners’ bond rating was demoted to junk and the
organisation lost $1.1 million a day in interest and
operating costs. The airport software was revolutionary,
but posed a high business risk in the circumstances. The
process failure? The initial business direction does not
appear to have been constrained by intelligent risk
management. Again, we’re not sure about this one. Perhaps
arrogance (we’re world class civil engineers and we could
write that bit of software blindfolded so what’s the
problem?) and greed (look, we’re really jolly good at
writing experimental software so give us that contract and
we’re sure we’ll find a way to get it written by the time
you’re ready with the rest of the airport. When was that?
October? This year?)
• Most of the desktop computers in a government welfare
agency were paralysed for four days when a failed operating
system upgrade took them offline. The outage, covering 75
percent to 80 percent of the agency’s 80,000 PCs, was one
of the largest in the country’s history. The outage
disconnected staff e-mail, benefits processing, and
connectivity to critical information and systems. The
welfare agency’s desktop computer failure was caused by an
unintended release of unready infrastructure software. The
process failure? This occurred in the “escrow” capability
management zone that exists between the project and IT
operations. It was a defect in the quality process (in an
organisation that had purportedly achieved some level of
quality certification). Laziness.
• In the same country a “computer crash” in another
organisation prevented pensioners from collecting benefits
payments. The computer crash that prevented benefit
payments was caused by distribution of software onto a
platform that had not been updated to the minimum platform
requirements. The process failure? Project management,
capability (configuration) management, and IT operations.
Laziness. Where was the configuration management? Where
were the stress tests for the new software on the old
configuration? Arrogance. “Those provincial hicks will wear
what we give them. What’s the worst that can happen?”
• In yet another organisation in the same country a call
centre and its systems for processing applications for
welfare payments ran so slowly that “up to two thirds” of
callers (in at least one region) were unable to get
through, and there is evidence that once through, payments
took up to six weeks after applications were lodged.
(Presumably, if people are needy they actually do need the
payments as soon as possible!). The slow call centres were
caused by the same problem – updated software that had not
been fully tested on all the configurations that were in
use across an agency with hundreds of branches and call
centres. The process failure? As above, project management,
capability (configuration) management, and IT operations.
Laziness and greed – as above.
If you think your organisation is an exception, think again
– this time looking through a “behavioural” lens. I would
be very surprised if there is any organisation anywhere in
the world that is not led by at least some people who are
lazy, arrogant or greedy enough to cause serious problems.