Jump to content

Software Update Bug Causes WorldWide Network Outages


Snargfargle

Recommended Posts

This shows you how dependent the entire world is on only a handful of companies. A bug in a software update has caused airports, banks, and emergency communications centers to go down all over the world.

 

Edited by Snargfargle
  • Like 3
  • Thanks 1
Link to comment
Share on other sites

  • Snargfargle changed the title to Software Update Bug Causes WorldWide Network Outages

Heard that on the news this morning. Apparently linked to a separate company, Crowdstrike, that does antivirus protection for Microsoft. Huge hit on travel systems and others.

  • Like 2
Link to comment
Share on other sites

20 hours ago, I_cant_Swim_ said:

Heard that on the news this morning. Apparently linked to a separate company, Crowdstrike, that does antivirus protection for Microsoft. Huge hit on travel systems and others.

 

I remember Crowdstrike as the security firm that caught the Russian hackers breaking into the DNC back in 2016.

There is no direct link to Microsoft I'm aware of, MS is likely just another customer who's servers (Azure) were taken offline.

The crashes are directly related to Falcon, part of CrowdStrike’s cybersecurity software which is used by many companies to protect cloud based data. It runs on whatever OS the customer uses. (i.e. Linux/Unix/MacOS or Windows) On July 19 at 04:09 UTC an update for Falcon was released which took down Windows clients that received it. This was remediated by 05:27 UTC, 1 hour and 18 minutes later.

The chaos caused in those 78 minutes will almost certainly turn out to be the largest IT outage in history.

So far.


There are a number of vulnerabilities highlighted in this incident.

One is the kernel's vulnerability within Microsoft's operating system. Ever installed a crappy driver from HP to use your new printer, only to have it crash and take the PC down with it? (BSoD) That usually happened because the driver hooked directly into the kernel, and when it died it took the OS with it. While MS has certainly improved a lot over the years in moving things like device drivers from kernel space to user space, it's very likely something similar has happened here. Windows AV software often gets kernel level access to protect against threats, but that level of access means errors within that same software can easily take the system down. The difference is you could immediately reboot your own PC because you're right next to it. Many (if not all) of those dead servers will need someone to physically fix them. That's going to take a while.

Second is the vulnerability inherent in putting everything in the cloud. There are several facets to this. One, if all of these companies had their data in offline servers (air gapped) there would be no need for the security software that took them all down, thus no interruption of services. Two, offline servers would not be vulnerable to the data breaches that constantly make the news, exposing millions of people's personal info.

Which leads directly to the vulnerability that hangs directly over all of us, a bit like the sword of Damocles. No matter how diligent we are about our own online security, companies like the ones involved in this mess have deliberately decided to risk their customer's personal data without consent purely for their own convenience. Just a matter of time until it is exposed, and there's little (if anything) one can do about it.

 

 

 

 

 

https://www.crowdstrike.com/blog/statement-on-falcon-content-update-for-windows-hosts/

 

https://www.crowdstrike.com/blog/technical-details-on-todays-outage/

 

https://www.crowdstrike.com/platform/

 

Edited by Amaruk
spelling
  • Like 3
  • Thanks 1
Link to comment
Share on other sites

 
giphy.gif 

I've been reading about this (topic) on other sites and social media.
One of the articles I read mentioned that the failure to screen/vett/test the piece of ... software ... is what allowed that code to get into a position where it could cause problems.

Once the problems began to manifest, it was like the fears of "Year-Two-Thousand"/"Y2K" scenario had come true, but in a different decade and for a different reason.

  • Like 2
Link to comment
Share on other sites

19 minutes ago, Wolfswetpaws said:

One of the articles I read mentioned that the failure to screen/vett/test the piece of ... software ... is what allowed that code to get into a position where it could cause problems.

Vindicating all those QC employees, that most large companies need but no-one wants to pay or hear from...

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

53 minutes ago, I_cant_Swim_ said:

Vindicating all those QC employees, that most large companies need but no-one wants to pay or hear from...

Also vindicates those who warned against the whole "cloud" (AKA Outsource your critical business functions) architecture...  So many layers of outsource that nobody knows where the critical data really is... except the data farmers who pay the cloud providers for your stuff!

That's not a conspiracy theory, just a good business plan.

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

2 hours ago, Wolfswetpaws said:

 One of the articles I read mentioned that the failure to screen/vett/test the piece of ... software ... is what allowed that code to get into a position where it could cause problems.

 

Looks like that's exactly what happened.

Turns out Falcon does install a kernel level driver on Windows during startup. It appears the update that caused the issue pushed a .sys file which contained only null bytes. (all zeros) Some have speculated the driver tries to use this file during startup, causing it to fail and take the client with it. CrowdStrike in turn has claimed the null bytes in the offending file are not responsible. Regardless,  any attempt to restart systems with that file will blue screen. Affected systems need to be booted in Safe Mode to have the offending file removed. This likely means manual in-person intervention will be required for these servers to get this sorted. This is not going to get done over the weekend, we're talking potentially weeks of work. My sincere condolences to all the Techs stuck doing this to the millions of servers CrowdStrike just hosed. Especially those who's machines use Bitlocker encryption... The world owes every one of you a beer. (or appropriate beverage of your choice)

It goes without saying pushing an update like this was an unbelievably bad mistake. It will be interesting to see what the fallout will be for CrowdStrike, if any.

 

 

Link to comment
Share on other sites

25 minutes ago, Amaruk said:

"... It will be interesting to see what the fallout will be  ..."

Yeah.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.