Here’s what happened when 8.5 million Microsoft devices crashed

Cybersecurity giant CrowdStrike says its recent software update has caused a massive global tech outage, impacting some 8.5 million Microsoft devices worldwide.

While still affecting less than one per cent of all Windows computers in use, the incident has significantly impacted several vital sectors, demonstrating how far-reaching modern digital infrastructure can be.

In a blog post, Microsoft revealed just how widespread the issue has been: “We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one per cent of all Windows machines.” The impact has been felt far and wide despite the figure being a fraction of the total number of Windows devices, underscoring CrowdStrike’s leadership in cybersecurity.

Impact across multiple industries

The impact of this outage has been felt across multiple industries:

1. Aviation: Thousands of flights were cancelled, leaving passengers stranded or facing extensive delays. Delta Air Lines, one of the most affected carriers, reported over 600 flight cancellations by Saturday morning, with more expected.

2. Broadcasting: Several broadcasters were forced off the air, disrupting media services.

Healthcare and Banking: Customers found themselves unable to access critical services, including healthcare and banking systems.

3. Government and corporate sectors: With over half of Fortune 500 companies and key government agencies like the U.S. Cybersecurity and Infrastructure Security Agency relying on CrowdStrike’s software, the outage’s effects rippled through both public and private sectors.

Technical details of the incident

The company found that the reason for being unreachable was that CrowdStrike used a patch for its widely-used Falcon sensor software. This update was aimed at improving cybersecurity to protect against new threats. However, bugs in the code of the update files caused many clients to experience crashes while working with Microsoft Windows.

Security experts, including Steve Cobb, the CSO at Security Scorecard, stated that this file must have found a way to pass through whatever vetting or sandboxing process is used for testing.

The issue lies in “a file that contains either configuration information or signatures,” said Patrick Wardle, a security researcher specialising in operating system threats. This is important for recognising certain types of malicious code or malware.

Some public images of the outage include the infamous “blue screens of death” — the error messages displayed on affected computers, widely spread across social media platforms.

CrowdStrike has provided information to repair the systems damaged by the incident. However, the measures needed to restore the systems are substantial and will be tasking, as the deficient code must be manually purged from each of the affected systems.

Microsoft is participating in the recovery process. The software giant is cooperating with CrowdStrike to create an accelerated fix for Microsoft’s Azure infrastructure. Furthermore, Microsoft has contacted Amazon Web Services and Google Cloud Platform, among other large software providers, to inform them of their observations and the impacts on the industry.

Industry implications and lessons learned

This incident serves as a stark reminder of the potential risks associated with widely-used cybersecurity software and the critical need for rigorous testing protocols. John Hammond, principal security researcher at Huntress Labs, emphasised the importance of a more cautious approach to software updates: “Ideally, this would have been rolled out to a limited pool first. That is a safer approach to avoid a big mess like this.”

The outage also highlights the delicate balance between the need for frequent security updates and thorough testing. As Patrick Wardle noted, “It’s very common that security products update their signatures, like once a day… because they’re continually monitoring for new malware and because they want to make sure that their customers are protected from the latest threats.” However, this frequency may have contributed to insufficient testing in this case.

Historical context and industry trends

This is not the first case we have seen with a high-profile cybersecurity firm. McAfee shut down hundreds of thousands of machines with buggy antivirus updates in 2010. But the worldwide ramifications of the CrowdStrike downtime showed just how big a footprint one company had planted across all segments of industry, as more and more businesses come to depend on cybersecurity software.

For all the affected organisations currently doggedly working to rebuild their systems, this event is a stark reminder of how tightly everything in our digital ecosystem can be wound. At the same time, this should stand out as a test of very strict testing policies, reshaping the approach to slowly delivering key updates and establishing fail-safe plans that can be put in place if it happens again.

The CrowdStrike outage also begs the question of whether too much risk is being concentrated in the cybersecurity industry, and whether these outages further prove that we need to diversify security solutions within our systems.

This will surely be a strong point of reference as the digital world continues to change and renew conversations around best practices in software development, testing, and deployment, especially throughout critical infrastructure and security systems.

(Photo by Joshua Hoehne)

See also: The day CrowdStrike broke the Internet, China was largely unaffected. Here’s why

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: cloud, cybersecurity, microsoft, Security