when computer screen The world turns blue on Friday, Flights grounded, hotel check-in became impossible and freight shipments came to a standstill. Businesses turn to paper and pen. It was initially suspected to be some kind of cyber-terrorist attack. However, the reality is much more mundane: Poor software updates From cybersecurity company CrowdStrike.
“In this case, it was a content update,” said Nick Hyatt, director of threat intelligence at security firm Blackpoint Cyber.
Because CrowdStrike has such a broad customer base, content updates are felt around the world.
“One mistake can have catastrophic consequences. This is a great example of how interconnected our modern society is with IT – from coffee shops to hospitals to airports, mistakes like this can have huge consequences ,” Hyatt said.
In this example, the content updates are associated with the CrowdStrike Falcon monitoring software. Hyatt says Falcon has deep connectivity to monitor endpoints (in this case, laptops, desktops and servers) for malware and other malicious behavior. Falcon automatically updates itself to respond to new threats.
“The error code was pushed through the auto-update feature and, well, here we are,” Hyatt said. The automatic update feature is a standard feature of many software applications and is not unique to CrowdStrike. “The consequences here are catastrophic simply because of CrowdStrike’s actions,” Hyatt added.
On July 19, 2024, in Ankara, Turkey, CrowdStrike, which provides network security services to the American technology company Microsoft, caused a global communication interruption and a blue screen of death error on the computer screen.
Harun Ozalp | Anatolia | Getty Images
Although CrowdStrike quickly discovered the problemand with many systems back up and running within hours, the global cascading damage is not easily reversed for organizations with complex systems.
“We thought it would take three to five days for things to be resolved,” said Eric O’Neill, a former FBI counterterrorism and counterintelligence agent and cybersecurity expert. “This is a period of downtime for the organization.”
O’Neill said it didn’t help that the outage occurred on a Friday in the summer when many offices were empty and the IT supply to help deal with the problem was low.
Software updates should be rolled out gradually
O’Neill said one lesson learned from the global IT outage is that CrowdStrike updates were meant to be Gradual rollout.
“What Crowdstrike does is roll out the update to everyone at once. That’s not the best idea. Send it to a group and test it. It should go through some level of quality control,” O’Neill said.
“It should be tested in a sandbox in multiple environments before being released,” said Peter Avery, vice president of IT security and compliance at Visual Edge.
He expects more safeguards will be needed to prevent such failures from happening again in the future.
“Companies need the right checks and balances. It could be one person deciding to push this update, or someone choosing the wrong file to execute,” Ivory said.
IT industry calls it single point of failure — An error occurs in one part of the system, leading to a technical disaster across industries, functions and connected communication networks; a huge domino effect.
Call for building redundancy into IT systems
Friday’s events may prompt companies and individuals to increase their cyber readiness.
“The bigger picture is how fragile the world is; it’s not just a network or technology issue. There are a lot of different phenomena that can cause power outages, such as solar flares that can disrupt our communications and electronic equipment,” Ivory said. explain.
Ultimately, Friday’s debacle is not an indictment of Crowdstrike or Microsoft, but an indictment of how companies view cybersecurity, said Javed Abed, an assistant professor of information systems at Johns Hopkins University’s Carey Business School. “Business owners need to stop viewing cybersecurity services as just a cost and instead view them as an important investment in their company’s future,” Abed said.
Businesses should do this by building redundancy into their systems.
“A single point of failure should not be able to stop the business, and that’s exactly what happened,” Abed said. “You can’t rely on just one cybersecurity tool, Cybersecurity 101,” Abed said.
While building redundancy into enterprise systems is costly, what happened on Friday was even more costly.
“I hope this is a wake-up call and I hope it causes some change in the mindset of business owners and organizations to revise their cybersecurity strategies,” Abed said.
How to deal with “kernel-level” code
From a macro level, it’s fair to attribute some systemic blame in the enterprise IT space, which often views cybersecurity, data security, and technology supply chains as “nice-to-haves” rather than necessities. , and there is a general lack of cybersecurity leadership within organizations, said Nicholas Reese, a former Department of Homeland Security official and lecturer at New York University’s SPS Center for Global Affairs.
Reese said that at a micro level, the code that causes this damage is core-level code that affects every aspect of computer hardware and software communications. “Kernel-level code should be subject to the highest level of scrutiny,” Reese said, with approval and implementation requiring completely separate processes and accountability.
This problem will persist throughout the ecosystem, littered with products from third-party vendors, all with vulnerabilities.
“How do we look across the ecosystem of third-party vendors and see where the next vulnerability will be? It’s almost impossible, but we have to try,” Rees said. “Until we address the number of potential vulnerabilities, this is not a possibility, but a necessity. We need to focus on and invest in backup and redundancy, but businesses say they can’t afford to pay for something that may never happen.” .