CrowdStrike shows the systemic risk of being dependent on a single major provider

What might be the longer term impact of Friday's outages?

Shutterstock

Image:
Shutterstock

Alina Timofeeva, currently a Board Member for the British Computer Society, The Chartered Institute for IT, and a strategic advisor in data and technology to the C-suite of major financial services organisations, shares her thoughts on the impact of the outages in areas such as crisis communications and regulation.

The global IT outages that occurred last week will have a lasting and far reaching impact, way beyond the initial chaos that the CrowdStrike update caused.

We saw with the outage that with technologies come risks. And customers may not necessarily be fully aware of what they are exposed to. The global ripple effect of the outage illustrates the interconnectivity across the supply chain and risk concentration in this market.

Image
Figure image
Description
Alina Timofeeva

Software vendors like CrowdStrike have become so large and so interconnected that their failures can damage the global economic system and tens of millions of customers globally.

It is key that companies, but also governments and the regulatory ecosystem, are more mindful and perhaps concerned about the systemic risk of being dependent on a single major provider.

Last week it was CloudStrike and Microsoft. In the future it could be cloud giants like Amazon, Microsoft or Google who fail and this would impact tens of millions of customers.

From the governments' perspective we need to start monitoring in detail the impact of this interconnected state, and identify the future events that could start small but become much bigger. This would help us build the nation's resilience and ability to respond to similar events.

Digital trust & communication

Building and restoring digital trust is key. I would define digital trust as a confident relationship with the unknown. Further enhancement around technology, third-party management, operational resilience – this combination of existing and incoming regulations, for example DORA, which is coming into play in 2025 - can help ensure that the future of financial services and products is cost-efficient but also safe and secure.

I do feel that real disruption happening isn't technological. At its core is empowerment. Empowering us as customers to navigate through change and uncertainty in an agile an safe way. Communication is key here to maintain transparency and trust.

The proportionate response to Friday's outage would include clear, transparent, and timely communication both externally with customers and internally with employees around the impact the outage had on the organisation, material services impacted, what key steps are being taken to restore the material services, and clear timelines on when these will be restored.

However, I believe that the communication should not stop there. More strategically, what are the steps being taken by the company to ensure that this situation does not impact or harm the employees and customers in future, so that trust is restored longer term?

I would anticipate that after the crisis communication, strategic communication over the next fews days is necessary around what will happen, not only at company level, but at the government and regulatory level to mitigate these risks in future.

Regulation

Fridays events should be a wake-up call for employers to invest in tried and tested disaster recovery plans, which in many cases were exposed more as a paper-based exercise rather than a plan that was tried and tested at scale across the key simulation scenarios, including extremely unlikely crisis scenarios - just like this one.

I believe that there will be a much bigger focus from regulators on operational resilience, holistically across data, technology, people and processes.

To ensure a proportionate response, material services (e.g. payroll or making payments) would need to be prioritised. Whilst there is already focus on DORA implementation for some industries by 2025 deadline, it doesn't apply to all sectors and there will be questions of whether it is enough.

I would anticipate a bigger push from regulators to mitigate concentration risk, not only within companies but also at the level of the providers of material services. I anticipate both tighter regulations, and increased scrutiny from the regulators of companies choosing to prioritise cost and efficiency over the safety and security of their operations and the potential harm to customers.

Whilst I am not anticipating new regulations being developed on the back of this situation in addition to existing ones, I am anticipating greater adherence to existing ones including DORA, FCA and EBA guidelines etc and the regulators being more stringent. Currently compliance varies and I would anticipate in the conversations with regulators, companies would want to demonstrate at least full Level 3 compliance with the key risks and existing or proposed control frameworks, versus this being a checkbox exercise.

Fridays outage wasn't just about a technology update that went terribly wrong. It was about the impact it made both internally on operations and externally on customers. From the regulators perspective at the company level the key questions to answer would be: