GET THE GUIDE

Check out Fusion's helpful guide "Evaluating Business Continuity Management Software: A Buyer's Guide for Practitioners"

awards and recognition
Blog

The Outrage or Lack Thereof: Lessons Learned from the CrowdStrike Outage

August 13, 2024

It’s been interesting to see the accepting attitude of customers that a disruption as large scale as the CrowdStrike outage would occur; ire and blame seems to have only been aimed at individual firms when those organisations have failed to revert to manual or alternative processes and recover within similar time frames to their peers.

For example, in an effort to keep flights running, many airports switched to manual processes; Delhi wrote out departure gate details on a whiteboard, and Gatwick switched to manual security checks on boarding passes. Things might not have run as smoothly as usual or been as secure as one might expect, but they were seen to be making an effort.

The three largest airlines in the world temporarily had to ground their fleets, but only one stood out a week later as still struggling to get back up and running – and taking a massive financial loss whilst its CEO attended the Olympic Games6.  There might not be viable alternatives for certain routes, but they do seem to have lost a competitive advantage.

The Outrage or Lack Thereof - Lessons Learned from the CrowdStrike Outage Image

The Need for a Speedy Response

The speed with which this event unfolded worldwide, with regions waking up in sequence to the blue screen of death, necessitated a rapid response, reflective of today’s customer expectations. CrowdStrike led a masterclass in just how quickly the public opinion can change about a firm and highlighted the need for agility when running a response:

  • On Thursday 18th July, they were considered the leader in the market for endpoint cyber security, threat detection, and response.7
  • At 10:50 BST on Friday 19th July, CrowdStrike issued what should have been a reassuring statement2, but which came across as an attempt to downplay the events unfolding as ‘not a security incident or cyberattack’ and implying that the incident had been dealt with.

The Outrage or Lack Thereof - Lessons Learned from the CrowdStrike Outage Image #2

  • They rapidly built a support portal page on the first day of the outage4, but the homepage on their website remained unaltered, leaving some confused.
  • Their CEO then released a video on The Today Show16 which appeased many due to the inclusion of an apology and the evident agitation he felt on behalf of his customers.
  • They followed it up with well-scripted letters to customers and partners8, but it was really the LinkedIn post from the CISO Shawn Henry10, coming a few days after the outage, that truly resonated with the security, risk, and resilience community (the buyers of tools such as CrowdStrike). Whilst it might have been scripted by a PR team, it came across as deeply personal, and he owned the failures whilst promising to do better in the future; many a CISO could have imagined themselves in his shoes.
  • On 24th July, they came under attack for offering their staff and partner firms a $10 UberEats voucher by way of an apology1; whilst the gesture was no doubt kindly meant, it rather undermined the thousands of man hours lost, weekends ruined, and loss in customer trust.
  • In the aftermath, most firms we’ve spoken to have reported that they will continue to use CrowdStrike14 and were even impressed at the speed with which the firm responded, found the problem, issued guidance for workarounds, and then worked with them to reverse the faulty update.
  • That doesn’t reverse the 32% drop in share price and a $25bn loss in value 12 days after the outage though3.

The Third-Party Threat

There seems to have been a lot of confusion around how to classify this incident; some see it as a digital or cyber issue, attempting to claim on their cyber insurance. However, despite CrowdStrike being a cybersecurity provider and it causing an IT outage, most cyber policies won’t cover “downtime due to non-malicious cyber events at a third-party network service provider.”13 It speaks to the fact that third parties and Operations, Security, and IT teams are now all intrinsically linked.

The CrowdStrike incident demonstrates that security is no longer a sufficient lens for firms to manage technology risk through. Resilience is the overarching principle that firms need to attain – and it should be seen as a higher-level objective than security. Whilst security is a pillar of resilience, resilience is a broader and more practical framework for managing a business and its risks.

The Active Comptroller Michael J. Hsu commented repeatedly on Operational Resilience earlier this year, and it seems likely that the U.S. regulators will be making moves soon. Hsu stated, “I would like to focus my remarks today on operational resilience. This topic is often overlooked or overshadowed by debates about capital and liquidity. It shouldn’t be… It warrants our full attention.”11 Perhaps it is time to welcome the Head of Resilience into your C-suite after all?

The Question of Accountability

The CrowdStrike incident saw many firms use the defence of “it is outside of our control,”9 as their services were impacted, and they could provide no workaround to customers. However, the frequency of third-party incidents has made this excuse invalid. Firms own the end-to-end value chain of their services: they own all the upside in revenue, market share, and value creation. They also own all of the downside though and all of the risk – including third-party risk.

The CEO of Delta Airlines Ed Bastian lambasted CrowdStrike and Microsoft for not offering them any kind of compensation for the outage and has hired a high-profile attorney to look at pursuing the case. He argued that “If you’re going to have priority access to the Delta ecosystem in terms of technology, you’ve got to test this stuff… You can’t come into a mission critical 24/7 operation and tell us we have a bug. It doesn’t work.”5 There will be future third-party outages though, and firms such as Delta will have to recognise that they own the risks, even of parts of processes that they outsource.

It is poor business practice for firms to assume that by outsourcing particular services to a third-party provider that they are outsourcing any of the risk or responsibility. Firms affected by an outage need to take responsibility for the impact regardless of its root cause; blaming a third-party provider will only alienate customers who have paid to receive a service. Third-party risk is still owned by the company providing the service or product to the market – firms do not outsource any risk by outsourcing any portion of any service.

Interestingly, the Civil Aviation Authority (CAA) wrote to airlines that “disruption directly caused by the global IT outage is likely to be viewed as ‘extraordinary circumstances,’ for which the industry should not be financially liable.”15 This would put the incident in the same category as terrorism, sabotage, or extreme weather, which naturally fall outside the control of the airlines. One can’t help but think though that if this had only impacted a handful of firms, they would have been considered liable.

The Outrage or Lack Thereof - Lessons Learned from the CrowdStrike Outage Image #3

Retail firms, on the other hand, may have been impacted on a far smaller scale (many were unable to take card payments that day), but they appeared to accept the responsibility, were apologetic, and looked for ways to retain customers. For example, Gail’s Bakery offered coffee on the house or put orders through click and collect on their website2 (Figure 2: BBC – Clara Wikforss).

Long-Term Impacts of the Outage

The long-term impact worldwide is manageable in terms of financial losses; Hatzor estimated that losses worldwide could total around $15 billion, but insured losses may only total c. $1.5-3 billion12. For CrowdStrike, however, the future is a little uncertain; whilst they appear to have retained the trust of most CISOs, they are being sued by their own shareholders3 for making false and misleading statements about their testing programme – something they strongly contest.

Firms that can’t recover when their competitors can, over the long term will lose customer and market share and, ultimately, diminish the value chain that they have built their business on. On the other hand, firms that take ownership of how they have structured their business (and their risk profile) will improve their business operations, build a more reliable value chain that attracts more customers and captures greater market share, and thrive in an increasingly complex and delicate operating environment.

The best way that firms can protect this value chain (and grow revenue, market share, and enterprise value) is by building their resilience. This requires firms to take ownership of what they deliver to the market and have a data-informed view of everything that enables them to do so – so that when the unexpected happens, they can take the decisive action to protect their own reputation.

Looking for more information on CrowdStrike?

Visit Fusion’s CrowdStrike Content Hub for a complete list of our resources. 

Sources

Share