Global tech outage: Slow recovery begins as experts warn of future risks

Airports, healthcare services and businesses are hit after botched CrowdStrike software upgrade hits Microsoft Windows operating system

A display using the Windows operating system shows a 'blue screen' at an American Airlines check-in desk at Chicago O'Hare International Airport. Photograph: EPA

Services began to come back online on Friday evening after a global IT failure saw airports, healthcare services and businesses hit by the “largest outage in history”, but full recovery could take weeks, experts said.

Flights and hospital appointments were cancelled, payroll systems seized up and TV channels went off air after a botched software upgrade hit Microsoft’s Windows operating system.

It came from the US cybersecurity company CrowdStrike, and left workers facing a “blue screen of death” as their computers failed to start. Experts said every affected PC may have to be fixed manually, but as of Friday night some services started to recover.

As recovery continues, experts say the outage underscored concerns that many organisations are not well prepared to implement contingency plans when a single point of failure such as an IT system, or a piece of software within it, goes down. But these outages will happen again, experts say, until more contingencies are built into networks and organisations introduce better backups.

READ MORE

A Microsoft spokesperson said on Friday: “We’re aware of an issue affecting Windows devices due to an update from a third-party software platform. We anticipate a resolution is forthcoming.”

Texas-based CrowdStrike confirmed the outage was due to a software update from one of its products and was not caused by a cyberattack.

Its founder and chief executive, George Kurtz, said he was “deeply sorry for the impact that we’ve caused to customers”, adding there had been a “negative interaction” between the update and Microsoft’s operating system.

Q&A: What caused the global IT chaos and how long will it take to fix?Opens in new window ]

Elon Musk, owner of Tesla, said the outage caused “a seizure to the automotive supply chain” while banks in Kenya and Ukraine reported issues with their digital services, and supermarkets in Australia had problems with payments.

The Sky News and CBBC channels were also temporarily off-air in the UK before resuming broadcasting, while Australia’s ABC was also affected.

From Amsterdam to Zurich, Singapore to Hong Kong, airport operators flagged technical issues that were disrupting their services. While some airports halted all flights, in others airline staff had to check-in passengers manually.

Among the companies affected on Friday was Ryanair, Europe’s largest airline, which said on its website: “Potential disruptions across the network due to a global third-party system outage ... We advise passengers to arrive at the airport three hours in advance of their flight to avoid any disruptions.”

Heathrow, Europe’s biggest airport, said it was “working hard” to get passengers “on their way”.

A spokesperson for Heathrow said: “We continue to work with our airport colleagues to minimise the impact of the global IT outage on passenger journeys. Flights continue to be operational and passengers are advised to check with their airlines for the latest flight information.”

In the US, flights were grounded owing to communications problems that appear to be linked to the outage. American Airlines, Delta and United Airlines were among the carriers affected.

Berlin airport temporarily halted all flights on Friday. The aviation analytics company Cirium said 5,078 flights – 4.6 per cent of those scheduled – were cancelled globally on Friday, including 167 UK departures and 171 arrivals.

Reports from the Netherlands suggested there may be problems within the health service. The Israeli health ministry said “the global malfunction” had affected 16 hospitals, while in Germany the Schleswig-Holstein university hospital in the north of the country said it had cancelled all planned operations in Kiel and Lübeck.

Alan Woodward, a professor of cybersecurity at the University of Surrey, said the fix required a manual reboot of affected machines and “most standard users would not know how to follow the instructions”. Organisations with thousands of PCs distributed in different locations face a tougher task, he added.

“It’s just sheer numbers. For some organisations it could certainly take weeks,” he said.

He said the outage was caused by an IT product called CrowdStrike Falcon, which monitors the security of large networks of PCs and downloads a piece of monitoring software to every machine.

However, Ciaran Martin, the former chief executive of the UK’s National Cyber Security Centre, said that unlike adversarial cyberattacks, this problem had already been identified and a solution had been flagged.

“The recovery is not about getting on top of the situation but getting back up. I think it’s unlikely to be very newsworthy in terms of ongoing disruption this time next week,” he said. – Guardian