CrowdStrike updates caused Linux outages in April

But they went unnoticed

CrowdStrike updates caused Linux outages in April

Image:
CrowdStrike updates caused Linux outages in April

A recent CrowdStrike update wreaked havoc on Windows machines worldwide, causing a wave of Blue Screen of Deaths (BSODs) that crippled operations in critical sectors like healthcare, finance and airlines.

While CrowdStrike initially assured Linux users they were safe, it has now emerged that similar issues have silently plagued Linux systems for months.

In April, a civic tech lab experienced a complete shutdown after a CrowdStrike update rendered all their Debian Linux servers unbootable.

"Crowdstrike did this to our production linux fleet back on April 19th, and I've been dying to rant about it," a disgruntled team member, who goes by user name JackC, said on Hacker News.

According to JackC, the lab's IT team discovered the issue stemmed from the update's incompatibility with their Debian version, despite being listed as a supported configuration by CrowdStrike.

"Crowdstrike took a day to respond, and then asked for a bunch more proof (beyond the above) that it was their fault," JackC said.

The cybersecurity firm acknowledged the bug a day later, and after weeks of waiting for a proper explanation, it only revealed that tech lab's Debian setup wasn't included in CrowdStrike's testing matrix.

JackC says it seems CrowdStrike prioritises pushing updates to user machines any time they want "whether or not it's urgent, without testing it."

Similar reports surfaced from users who upgraded to Rocky Linux 9.4, experiencing server crashes attributed to a Linux kernel bug. CrowdStrike has denied that this was related to its software.

“The server crashes referenced were not triggered by a CrowdStrike update. RHEL/Rocky was a Linux kernel bug, as evidenced by the patch issued,” a spokesperson said.

The issue highlights a potential pattern of inadequate testing and a lack of focus on compatibility across diverse operating systems.

Such incidents raise serious questions about CrowdStrike's software update and testing procedures.

Experts warn that prioritising rigorous testing across all supported systems is crucial to prevent future disruptions. Organisations relying on CrowdStrike are advised to approach future updates with caution and have robust contingency plans in place to mitigate potential outages, regardless of the operating system they use.

CrowdStrike Falcon Sensor is a kernel driver, meaning that it works at the lowest level of the operating system, where failure means a system crash, and a device reboot. Unfortunately in this case devices were stuck in a bootloop, and had to be restarted in safe mode.

Devices with encrypted hard drives will be more difficult to rectify. Nevertheless, in a post on X on Sunday, CrowdStrike said: "Of the approximately 8.5 million Windows devices that were impacted, a significant number are back online and operational."

In a blog post, CrowdStrike said, the crash was triggered by a faulty configuration update that was designed to "target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks".

The company denied reports that it had changed the way it interacts with the Windows operating system.

8.5 million devices impacted by CrowdStrike outage

On Saturday, David Weston, Microsoft's vice president of enterprise and OS security, revealed in a blog post that roughly 8.5 million devices - less than 1% of all Windows machines worldwide - were affected by last week's IT crash.

This marks the first official disclosure regarding the incident's scale, following widespread disruption.

The culprit was a problematic update from CrowdStrike, which triggered BSOD errors on impacted Windows machines. Thankfully, Mac and Linux devices remained unaffected.

While the number of affected devices seems relatively low statistically, the consequences were far-reaching. Banks, retailers, brokerage firms, rail networks, and even airlines suffered disruptions across the globe.

Flight operations were grounded in several locations, highlighting the critical role these seemingly small percentages play in maintaining vital infrastructure.

"While the percentage [of affected devices] was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services," Weston wrote.

Weston's post did not specify what proportion of Windows machines equipped with CrowdStrike software actually succumbed to the crash.

Weston said Microsoft was collaboration with CrowdStrike to resolve the issue.

Weston added that they were working with Amazon Web Services and Google Cloud Platform to ensure a wider solution.

This article was updated 9th December to include the statement from CrowdStrike denying any responsibility for issues affecting Rocky Linux/RHEL.