Just exited a meeting with Crowdstrike. You can remediate all of your endpoints from the cloud.
Posted by kuahara@reddit | sysadmin | View on Reddit | 590 comments
If you're thinking, "That's impossible. How?", this was also the first question I asked and they gave a reasonable answer.
To be effective, Crowdstrike services are loaded very early on in the boot process and they communicate directly with Crowdstrike. This communication is use to tell crowdstrike to quarantine windows\system32\drivers\crowdstrike\c-00000291*
To do this, you must opt in (silly, I know since you didn't have to opt into getting wrecked) by submitting a request via the support portal, providing your CID(s), and requesting to be included in cloud remediation.
They stated that sometimes the boot process does complete too quickly for the client to get the update and a 2nd or 3rd try is needed, but it is working for nearly all the users. At the time of the meeting, they'd remediated more than 500,000 endpoints.
It was advised to use a wired connection instead of wifi as wifi connected users have the most frequent trouble.
This also works with all your home/remote users as all they need is an internet connection. It won't matter that they are not VPN'd into your networks first.
thortgot@reddit
Any reasoning on why this is opt in?
kuahara@reddit (OP)
They said for legal reasons...I tried not to laugh.
If someone shoots me and then provides unauthorized aid, the unauthorized aid is not what I'll be suing for.
edgeofenlightenment@reddit
My theory is that if their customers' systems came back up without notice, 98% of the customers would be thrilled, and 2% would find that their systems came up in the wrong order, or came up in an unsupported configuration or without staff in the right places for audit-compliant monitoring, and those customers would try to pin any resulting issues on Crowdstrike as a breach of the contracts that detail very precisely how Crowdstrike software is to be updated in their environments.
-_G__-@reddit
Heavily government regulated (multiple jurisdictions) customer environment here. Without going into details, you're on the right track with the 2% notion.
RogerThornhill79@reddit
One would also assume their response to customers was also heavily government regulated. It's not a bug its a feature.
b_digital@reddit
can you cite a law or are you just doing the libertarian neckbeard thing?
RogerThornhill79@reddit
looking at the illegimate government pushing for nuclear war as we speak and Bidens not even in charge of his own colostomy bag... Libertarian.. says the jack boot fascist.
jjwhitaker@reddit
There's very little cons and a lot of potential lawsuit-like cons to another automatic roll out. If anything it would clearly violate any remaining trust, even if it's announced before being deployed by default.
CRWD is playing it close to the chest here and allowing people a 1 hour or so fix, if you have your ducks in a row.
At a minimum a company can request their workstations and critical infra be included and get core services running faster vs manual fixes against possibly thousands of servers.
Plus we're admins. If you told me I could script a solution against every endpoint by filing a ticket with a spreadsheet or list attached then do reboots I'd be doing that first too.
fireuzer@reddit
Perhaps, but that being enabled for the account doesn't necessitate an automatic restart. It would simply dictate the behavior of the subsequent reboot.
Automatic_Ad1336@reddit
It doesn't need any extra legalese. It's written consent vs. no opportunity to opt out. Very different levels of acceptance by the customer.
catwiesel@reddit
I bet to opt in you have to wave any and all rights to sue them, ask them for money, end the contract sooner, heck, you even wont talk bad about them or ask them to apologise, in fact, you admit that its your own damn fault, and that you will give them your first born and second born should they ask of it.
yeah right its for legal reasons. all of them good for them, and none of them good for the impacted customer.
ianal. and I did not check. by thats what my cynic heart is feeling until I get solid proof otherwise.
ALSO... repairing a bsod-ing machine via remote update. thats. I guess, maybe not entirely impossible, but thats a very big claim to make. I hope it works out, but I am sceptical unless its shown working en masse
ShepRat@reddit
They can put whatever they want in this disclaimer, but I doubt they'd bother cause the lawyers know it'd be invalidated in the first 2 seconds in front of a judge.
DarthPneumono@reddit
Everyone still puts 'void if removed' stickers on too. It stops some people, and that's worth it for a few lines of text.
ShepRat@reddit
Depends on the Jurisdiction I guess. In many places they can leave themselves open to fines and/or legal action by misleading customers about their rights.
skankboy@reddit
-Wave
SnipesySpecial@reddit
So ransomware?
Nuggetdicks@reddit
He said they did 500K in 1 hour.
magistratemagic@reddit
nice reading comprehension, /u/Nuggetdicks
catwiesel@reddit
very surprising
kuahara@reddit (OP)
No, I did not say that. I said that as of the time of the meeting, they had already used it to remediate 500k computers (spread across multiple agencies who had opted in).
I also said that wait time to opt in is about 1 hour or less.
I've never indicated that anyone can remediate 500k machines in 1 hour.
UncleGrimm@reddit
I can confirm this is working at our org. Not on 100% of systems but it just got us down to a few hundred left, had over 50k initially
Fresh_Dog4602@reddit
So and how does this system of theirs work then? because this is a sort of remote kill-switch or whatever it is they do. So it was always there to begin with
UncleGrimm@reddit
My understanding is- it’s basically issuing a threat command from their cloud to quarantine the file. They couldn’t roll this out immediately because the BSOD almost always won the race condition, so over the weekend, they reconfigured and relocated a bunch of their servers to make it more likely that the BSOD loses the race condition.
PlannedObsolescence_@reddit
Can you explain further how you came to that understanding? Did you get info from someone internally at Crowdstike?
crankyinfosec@reddit
Ya this doesn't make sense, this is purely up to agent logic to pull the threat command to quarantine the channel file, and then its off to the race conditions! They should just be able to issue the command via API's to all endpoints effected. There shouldn't be any "reconfigure and relocating of servers" This sounds like more FUD on why this wasn't done Friday. My guess they finally figured out this was possible by looking at what actions happened at what time and realized this may beat the crash.
UncleGrimm@reddit
If that were true (that the race condition only ever happens after the agent pull), then this solution shouldn’t require multiple reboots. But it can
crankyinfosec@reddit
Given my experience in the AV industry, there are likely two threads or processes spawned and concurrently working.
The remediation function is likely waiting for network which can take a variable amount of time to fully initialize. And depending on how the network being available is detected there may be a variable amount of time it takes for it to reach out to the CS servers to fetch the list of threats to remediate. And then there is the remediation function which takes time and is IO dependent (given most machines on SSD's / NVME devices this should be the least of the issues).
While all that is hapening the kernel driver is likely being loaded and depending on the loading order of others that preempt it may take longer or shorter, and then it has to read all the def files off disk before it gets to the bugged one. This would all lead to the inherent race condition and how system dependent it may be. And why there may be situations where one option hits near 100% of the time.
But them 'reconfiguring and relocating servers' makes no sense since this would be driven by agent logic.
advanceyourself@reddit
My thoughts on the server side is that they are not waiting for agent logic outside of it showing that it's "Online". I'm guessing the agent function for loading upgrades/updates is later in the loading sequence. They are probably forcibly pushing changes once the client is opted in and connection speed/latency would certainly make a difference in that case. They may also be repurposing resources given the impact. The infrastructure for traditional update/upgrade infrastructure probably wasn't sufficient.
The black hat side of my brain thought about how devastating this function would be in the wrong hands. Let's hope their interns are better than Solarwinds.
UncleGrimm@reddit
I don’t think we’re in disagreement then? It’s winning the race because they reduced the time it takes for that to complete.
It certainly does when, while trying to deliver the automated solution, they are likely experiencing huge log query volumes from in-progress manual remediations.
PlannedObsolescence_@reddit
There definitely could be an element of truth to the 'reconfigure servers' thing, I haven't been impacted by the CS issue so haven't actually been hands on with a computer. But if the race condition between the BSOD and the agent calling home for commands could be 'won' more often by just a few milliseconds quicker of a response - or if the agent was already talking to the servers, just they were not prioritising sending the (update agent or quarantine file) command instantly, then I can definitely understand how changing the way that communication works could help things. But really I have no idea if any changes happened related to that.
From what OP and other in this thread said, the 'fix' you get opted into is for them to send the command to quarantine specific parts of the agent to force it to repair itself.
UncleGrimm@reddit
Correct, that’s my understanding of the process.
Thats also why MS published “reboot up to 15 times” as a potential fix pretty early on. That’s not a magic number used there by Windows or anything, the agent had a potential (but very slim) chance to win the race and pull down the update.
By sending the quarantine command from the cloud server, the network wait is probably reduced significantly versus trying to pull the fixed content file.
Latency makes a lot of sense to me here.
Fresh_Dog4602@reddit
oh man... just realized... all those companies with radius authentication probably going ffffffuuuuuuuu as this would delay the networking process (if it even can complete at that point unless you use stuff like MAB or something)
UncleGrimm@reddit
I work on the cloud team at a Big Tech that helped coordinate some of the response for this since CS is one of our partners (you can probably guess who).
We were directly informed of an automated solution undergoing experiments over the weekend, and the race-condition was something they were honing in on. Whether that’s definitively how this final version works, I don’t know that for sure, but it definitely tracks.
PlannedObsolescence_@reddit
The parent comment that /u/Fresh_Dog4602 replied to is now deleted, it originally said:
nartak@reddit
Probably a billing killswitch for customers that don't pay.
Either way sounds like a MitM attack waiting to happen.
catwiesel@reddit
amazing
KaitRaven@reddit
This is presumably outside their normal procedures.
If they're going to make any atypical changes on your system, then yes it makes sense to get your approval first
SimonGn@reddit
As opposed to putting their customers in a boot loop being part of their Normal Operating Procedures?
DOUBLEBARRELASSFUCK@reddit
I haven't read all of the write ups on this yet, but I believe that may have been unintentional.
SimonGn@reddit
Intent does not matter. They messed up without approval but need approval to undo their mistake? Makes no sense
DOUBLEBARRELASSFUCK@reddit
They messed up without approval because you can't possibly ask for approval before fucking up. If they knew they were going to fuck up, they wouldn't have asked for permission, they would have not fucked up.
Imagine you paid someone to tile your bathroom. They come in, use the wrong color tile, then leave. They release after the fact that they've used the wrong tile. Do you expect them to crawl in through a window in the middle of the night to fix it, or ask you when and if they should come and fix it?
SimonGn@reddit
This is like a tiler doing their job of tiling and part way through they fuck up with the wrong tile and instead of picking up the wrong tile and replacing the correct one they say "oops I put down the wrong tile you have to fix it." Then when you fix it or partway through to they say "actually I can fix it but I need your permission" even though they were standing there with full access to jump in at any time.
KaitRaven@reddit
The effect was abnormal, but the channel update process was SOP.
DrMartinVonNostrand@reddit
Situation Normal: All Fucked Up
TammyK@reddit
As of this morning it no longer appears to be opt-in. This fix was automatically deployed to all of our devices as of this morning with no notification. I do understand why they pushed it, but forcing another change without proper notification after what just happened is kind of insane.
BeilFarmstrong@reddit
I wonder if it temporarily puts the computer in a more vulnerable state (even if only for a few minutes). So their covering the butts for that.
ThatDistantStar@reddit
highly likely. Hell the Window firewall might not even be up that early in the boot process
fireuzer@reddit
Even if that's the case, the computer isn't more vulnerable because the CIDs were shared. It's been equally vulnerable ever since the software was installed because of how they wrote the software.
tacotacotacorock@reddit
If that's happening you have a boot sector virus. Which I could see crowd strike mimicking but in a helpful way not maliciously.
ambient_whooshing@reddit
Finally a meaningful reason for the macOS kext to system extension changes.
DOUBLEBARRELASSFUCK@reddit
No, it's not highly likely. If the network comes up for a period of time before the firewall, that's a Microsoft issue, and it's a massive oversight. That would be an attack vector even without CrowdStrike.
tacotacotacorock@reddit
Seems like they're taking advantage of a classic boot sector virus infiltration and basically making their software act similar but in your favor. I have not dived very deep into this but that's exactly what it sounds like to me. The computer is no more vulnerable than it would be to a boot sector in the first place other than the crowd strike should prevent those things.
KaitRaven@reddit
This is taking advantage of existing functionality. It's not like they could push out a patch to the sensor agent in this situation.
loopi3@reddit
People have gotten sued for providing life saving first aid by the recipient of said aid. So…
pauliewobbles@reddit
The cynic in me wonders if you opt-in, then later attempt to pursue for costs and damages, by you opting in to this remediation will it be used as a defence to absolve of any wrongdoing?
"Yes, your system failure was due to a technical error, but as clearly shown it was rectified in a timely manner following your written indication to opt-in.
And No, any delay in providing a fix after the incident originally happened is entirely down to whatever date/time you chose to opt-in, since no-one can force anyone to opt-in to a readily available remediation as a matter of priority."
peoplepersonmanguy@reddit
Even if the opt in waives rights there's no way it would stand up as the date was done prior to the agreement.
DOUBLEBARRELASSFUCK@reddit
That's not really relevant. You can waive rights after the fact. The issue would be duress. "Your signed away your rights to sue while your entire infrastructure was down and your business was in danger." That probably wouldn't hold up.
BondedTVirus@reddit
Almost like... Ransomware. 🫠
reegz@reddit
It’s more because it’s an attack vector into machines. Wait a few months, the papers will come out. This has been available to some customers prior to last Friday.
Vangoon79@reddit
I didn't 'opt in' for damages. Why do I have to 'opt in' for repairs?
I wonder how long before this company burns to the ground.
100GbE@reddit
Very constructive, thanks for sharing.
newaccountzuerich@reddit
The opinion that Crowdstrike should die as a company is entirely valid, and one that I entirely subscribe to.
When a company refuses to heed the warning signals that a previous outage clearly exposed (June 27th iirc), doesn't change their processes, and then commits three cardinal sins of administration (Untested code to Prod; push to all endpoints simultaneously; push on a Friday), then the company needs to not be in business, and those running it need to lose their jobs for incompetence and malfeasance.
The one you replied to has a truth of it.
UncleGrimm@reddit
For having such a strong opinion, have you even been following this story?
It wasn’t a code-change, it wasn’t untested, and a week ago you would’ve been laughed out of a room for suggesting that 0day signatures of active threats be slow-rolled while the threats are currently active. The consensus was pretty strong that that’s bad practice to give customers control over.
Seems like their CICD process corrupted the file after it had passed the QA steps. No grand conspiracy here. Every Cloud company in existence has either torpedo’d worldwide DNS for several hours, or customer data gets corrupted in an outage and you figure out your backups don’t work… Just seems like the same growing pains we’ve all been through to me.
At-M@reddit
well, other people think different
no clue how good "the mirror" is as a source, but I can't find the other article i was looking for
PC_3@reddit
I just ran into legal in the kitchen. He believes it's because if you want the fast fix, you will waive your rights to sue them for the down time.
UncleGrimm@reddit
It’s silly but imo it’s a good idea on their part. I work for a Big Tech and the word is that they’ve been working closely with Microsoft to triple-validate all the proposed fixes.
BruschiOnTap@reddit
Probably for the same reason that got them into this mess in the first place.
caffeinatedhamster@reddit
I had a call with them this morning about this exact same process and the reason for the opt-in is because they are in a code freeze right now (engineer didn't say how long that would last) due to the shitshow on Friday. Because of that code freeze, customers have to opt-in to allow their team to deploy the change to your CID.
broknbottle@reddit
Lol what a crock of shit. It’s not like some external entity forced them to do a code freeze. Must be nice to push out a shit update, immediately declare a code freeze and then use the excuse, sorry we’d love to auto opt-in but we’re in a code freeze at the moment…
flatvaaskaas@reddit
Yeah but on the other hand, if they keep pushing updates while there update caused this chaos,,, that would also be frowned upon. People dont trust CloudStrike right now with updates rollout's, so pausing them would make sense
Fresh_Dog4602@reddit
well because .... at that point you are giving ring 0 of your operating system access to their servers via the network stack... lol is that even possible... wtf....
mindracer@reddit
For this to work it means their software already communicates with their servers at boot time, opt in or not
Fresh_Dog4602@reddit
yup
TrueStoriesIpromise@reddit
That's what you're buying with Crowdstrike or SentinelOne or any other cloud-based antivirus solution.
Fresh_Dog4602@reddit
sentinelone doesn't go that deep into your system like crowdstrike does
YummyBearHemorrhoids@reddit
Every EDR software worth their weight in dog shit does kernel level operations. Otherwise any type of malware that gets kernel access could hide indefinitely from the EDR software.
broknbottle@reddit
CrowdCrap is definitely not running as a kext on macOS with Apple silicon. Apple told all the worthless snake oil vendors to get the fuck out and forced their junk ware back to user space where it belongs
thortgot@reddit
SentinelOne (and every other EDR) has kernel drivers. Article Detail (n-able.com)
cowbutt6@reddit
The difference is that SentinelOne's equivalent of CrowdStrike's channel updates, Live Updates, is a) opt-in, and b) implemented in user space only.
thortgot@reddit
I'm not familar with their architecture, so I'll assume you are right. But there are still edge conditions that could occur.
Their ELAM driver (same as CS's) does pull definitions from dynamic files that are not WHQL driver certified.
cowbutt6@reddit
If that's the case with SentinelOne, then my understanding is that those dynamic files are part of the sensor distribution, and don't change unless you upgrade/downgrade the sensor to a different version. Which you should have tested first, of course.
thortgot@reddit
I don't believe that is correct. Their architecture is quite similar to CS with a split between sensor (agent) and definition (channel) with real time intelligence
My point is that you still have "uncertified" driver activity occurring in the kernel at a bootstart level.
If they allow for definition update rings then it would mitigate much of the risk but I haven't used the platform in quite a while.
UncleGrimm@reddit
S1 sits at the kernel level as well. Otherwise these solutions would be pretty useless against kernel-level malware.
They do have a version that doesn’t, but if you’ve ever red team’d it, I don’t know why you’d pay all that money and run it that way, becomes pretty easy to bypass and S1 is already iffy on preventing lateral movement to begin with
Fresh_Dog4602@reddit
Ah indeed. I've only seen their version on some OT systems. That might explain it.
UncleGrimm@reddit
Most likely yeah. Most businesses aren’t getting 0days burned on them (EDR solutions cripple them insanely quickly compared to 10 years ago), but if you’re an F100 or public sector, you kinda need to run this stuff at the kernel because malware could also be running there. They get targeted with all the fancy stuff + foreign actor malware.
thortgot@reddit
That's literally how their product works.
Fresh_Dog4602@reddit
Yes but no. Having kernel access didn't necessarily mean they already had it up along with the networkstack or even made use of it at that point. Because that means they could've fixed this already since Friday if it was that easy.
thortgot@reddit
That's the way the "15 reboot" method was functioning which users were reporting was working. A bit of luck of the draw/incremental progress.
I don't imagine it was easy to optimize the stack to increase the odds.
Fresh_Dog4602@reddit
i"ve seen the "15 reboot" method pass by. I've seen many ppl saying it doesn't work. But mileage may vary i guess
thortgot@reddit
Depends on how quickly the driver is crashing versus how long your network stack takes to connect.
I had one company that it worked pretty well for but not for several others I was helping.
KaitRaven@reddit
Crowdstrike already had that
jmbpiano@reddit
If they didn't already have that, this remediation wouldn't work, opted-in or not.
Seastep@reddit
Bit of an IT luddite but from my interpretation of that, connecting a system to a cloud provided service ON BOOT sounds like some space age shit I never thought I'd see.
qejfjfiemd@reddit
Super useful now we’ve finished manually fixing them all
nyhtml@reddit
My boss treated me to a steak dinner for fixing 2000 computers by Monday morning. I got woken up the Friday morning (we were closed) and at first, I thought it was Monday and that I had overslept only to realize what happened listening to the news.
I'm still waiting for that $10 UberEats voucher to arrive.
msalerno1965@reddit
IKR, what the F day is today? LOL
I gotta wonder how many box-seat tickets were handed out by sales reps... some of the "it's OK, it's fine..." comments all over social media are indicative of ... something odd. The apologists are out in force. Maybe they're just stock holders trying to keep their last $.02.
Also, remark to OP: the use of the word "silly" is ... silly. That is, if you're slap-happy silly from working non-stop since Friday.
hercelf@reddit
Yeah, I'm surprised I don't see any more comments - it was such a high impact thing because it couldn't be automatically remediated, and now it turns out there was a way after all? Even a worse look for Crowdstrike in my book...
darcon12@reddit
I mean, they are the top cybersecurity company in the world, and it takes them 4 days to figure out they can trigger a quarantine of the file and fix it remotely? Give me a break.
sol217@reddit
For real. They were already at the top of my shit list and they managed to move up the list even higher.
ItsWhomToYou@reddit
That’s the problem, getting remote users to hardwire to their personal network is virtually unheard of in today’s landscape.
Most people unless they have some endeavors into networking for personal use have no clues what an “Ethernet” cable is. It’s the equivalent using a fax machine at this point lol.
gjack905@reddit
Well, try WiFi first anyway then. All they said is it's more likely to struggle than a wired connection.
ItsWhomToYou@reddit
That’s fine I get that, and it was like 10% success rate that way in my experience, which is subjective ofc. But regardless, my point was just that NIC is disabled in safemode so the only way to remote in was to have a user hard wire lol sucked eggs.
gjack905@reddit
Oh, when you said "that's the problem" it implied that you thought Ethernet was the only prescribed way to deploy the fix
ItsWhomToYou@reddit
The boot process doesn’t give the device time to begin networking wirelessly
Dramatic_Proposal683@reddit
If accurate, that’s a huge improvement over manual intervention
HamiltonFAI@reddit
Also kind of scary they can access the systems pre OS boot?
Travelbuds710@reddit
I was worried about the same thing. Glad for a resolution, but it's a bit worrisome they have that much access and control over our OS. But a little late for me, since I personally fixed over 200 PC's, and already had to give our local admin password to remote users.
damiankw@reddit
You share a local admin password between computers?
AwesomeGuyNamedMatt@reddit
Time to look into LAPS my guy.
thruandthruproblems@reddit
LAPS is dead long live SLAPS. Also, funner to say.
Aggravating_Refuse89@reddit
LAPS is slapped if AD is bootlooped
thruandthruproblems@reddit
Hey, thats why you shouldnt have ANY AV/EDR on your DCs. Just ride life on the wild side!
Aggravating_Refuse89@reddit
You get to decide that? In my world those are not my decisions. AV on EVERYTHING no exeptions
thruandthruproblems@reddit
Read that with an /s
Aggravating_Refuse89@reddit
But I do agree
Unable-Entrance3110@reddit
I thought the new LAPS was called "Windows LAPS"
The only reference to SLAPS that I could find was some random Github project by that name
thruandthruproblems@reddit
The S stands for serverless. Entra ID (S)LAPS is the replacement for on prem attached LAPS.
Unable-Entrance3110@reddit
First I have heard it called that. Microsoft appears to call it Windows LAPS. There is no mention of Serverless LAPS on their documentation page.
https://learn.microsoft.com/en-us/windows-server/identity/laps/laps-overview
thruandthruproblems@reddit
What server are you installing your entra ID driven solution on?
BattleEfficient2471@reddit
None, MS already installed Azure ID on their servers.
It's not serverless, you just aren't in control of the server running it.
thruandthruproblems@reddit
Which means for you its serverless.
BattleEfficient2471@reddit
No, for me it means I am now stuck depending on servers I don't control and have no ability to secure.
For us oldsters we remember this all before, it's just renting time on mainframes all over again.
thruandthruproblems@reddit
I was there when the deep magic of 3.1 was written. I remember the magic of server 2000.
BattleEfficient2471@reddit
"Magic" for an OS that still can't delete an open file, sure.
Either way it's the wheel of computing. We will see it turn once again.
charleswj@reddit
How can he slaps?
thruandthruproblems@reddit
Lmao!
RogerThornhill79@reddit
Hoping he means desktop admin rights and not the system admin account. Fingers crossed. Please dont make it so.
getoutofthecity@reddit
He said local admin password, pretty clear to me he meant that he gave out the local Administrator credential for all the computers.
charleswj@reddit
What are those terms? Do you mean local admin vs domain admin?
RogerThornhill79@reddit
you dont give out local - unless its other administrators. and no its not a domain admin level. its a desktop admin level used to administer end user devices that require higher priv's
MuchFox2383@reddit
This is certainly a post of all time
charleswj@reddit
You're describing local admin. Local admins can fix one of these broken machines. Without local admin, they can't.
IHaveTeaForDinner@reddit
It's literally a kernel level driver. You can't get much more access.
Odd-Information-3638@reddit
It's a Kernel level driver, but the reason why we can fix this is because when you boot into safe mode it's not loaded. If this is able to apply a fix prior to it blue screening then it has much earlier access which is good because it's an automated fix for effected devices, but worrying because if they fuck it up again what damage will it do, and will we even be able to fix it?
DreamLanky1120@reddit
They have access as soon as their driver loads, so as long as their driver connects to them before loading the corrupt configuration file, all is well. I'm still surprised that not all in it have comprehand this, now a days every gamer knows about this because they use kernerdrivers for anticheat, which is fucking bananas.
IHaveTeaForDinner@reddit
Yeah there are many fuck ups here. Microsoft are not without blame. If a kernel level driver prevents boot, why isn't it disabled and let Windows boot into safe mode with a big warning saying so and so prevented proper boot
McFestus@reddit
How would windows know what driver is causing the issue if windows can't boot? Windows doesn't fully exist at the time the issue occurs.
Rand_alThor_@reddit
Linux kernel handles it just fine. It crashes the same preboot. But Linux kernel handled it
ultradip@reddit
Ahem... Crowdstrike DID affect linux users, a few months ago. It just wasn't as newsworthy.
National_Summer927@reddit
Not the point being made here
National_Summer927@reddit
The Kernel panic'd, the kernel knows everything that failed
narcissisadmin@reddit
Okay, then why the fuck does Microsoft have to make it such a PITA to get into recovery mode?
IHaveTeaForDinner@reddit
Alright the kernel then, you can't tell me it would be impossible for the kernel to keep track of what crashes the system.
shleam@reddit
Crowdstrike intentionally configures its kernel hooks as a “boot-start” driver. The OS boot loader will load these essential drivers on boot-up and the kernel does not have control until after this happens.
This is due the obvious reasons that you want to protect the system before any malware loading before Falcon can make changes or install rootkits that would be able to hide from detection.
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/specifying-driver-load-order
Unusual_Onion_983@reddit
Correct answer here.
McFestus@reddit
I mean, the kernel is kinda what the core of windows in, it's what's the boot sequence is loading. But the AV is going to be basically the first thing to initialize, because if other stuff can initialize first, it could be 'infected' with a virus and stop the AV from loading. So while obviously I don't know the exact boot sequence of the lowest-level details of the windows kernel, I would bet that the AV is one of the very first things to load in.
SomewhatHungover@reddit
It's marked as a 'boot start driver', there's a good explanation in this video, and it kind of makes sense as a well crafted malware could prevent crowdstrike from running if it could just make it crash, then the malware would be free to encrypt/steal your data.
TheDisapprovingBrit@reddit
Because kernel level literally means it can do anything. Any userspace level app and Windows can gracefully kill it if it starts doing weird shit, but with kernel level, you've literally told Windows it's allowed to do whatever it wants. At that point, Windows only defence if that app starts doing anything is to blue screen.
Also, "letting Windows boot into safe mode with a big warning saying so" is EXACTLY what it did.
ExaminationFast5012@reddit
This was a hit different to others, yes it’s a kernel level driver and it needs to be WHQL certified. The issue was that crowdstrike found a loophole where they could provide updates to the driver without having to go through WHQL every time.
Pitisukhaisbest@reddit
The bug must have been there in what was certified right? It must be some kind of input in those C-00*.sys files, which they say aren't drivers, which crashed the main csagent.sys?
WHQL clearly needs some improving.
cjpack@reddit
It was a .dat file that got mislabeled as a system file and should never even have been in the kernel level to begin with since it’s a configuration file, the problem wasn’t fucking up the food but mixing up the orders and one of those orders has shrimp and the person is allergic
Mr_ToDo@reddit
Shockingly it looks like that's actually wrong. I was going through some of the boot start driver documentation and found that signature stuff like they have seems to be fine
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/elam-driver-requirements
Sure the whole execution as signature thing seems to be more than a bit of a stretch for what it's intended to do(although I'm also trusting random internet comments on what it's actually doing here too), but it's still an intended mechanic of the early launch anti malware driver stuff that microsoft made(Put in a consistent location, preferably signed, that sort of thing). Sure when the system was put in place it was back when AV really was pretty much all signature based but a lot of modern ones just don't work that way(or just that way anyway), and that kind of leaves this in a weird place where you're putting something in place that really shouldn't be there but microsoft hasn't put a validation process in place to handle it any other way(the full driver validation is much too slow).
The part that I've been racking my head over is the crash recovery. Drivers, including ELAM like theirs allow for last known good drivers to be launched, and reading though the documentation I'm not sure if that covers the signatures(and I'm thinking it doesn't, and if it did it might only be for corrupt files anyway I'm not sure).
But the point is, I think that people may be getting angry over the wrong things. In my opinion it should probably just be a driver that wasn't written well enough, maybe poor testing, and definitely the lack of deployment/staging options for definitions in addition to those two.
I was also surprised at the 128KB size limit, and assumed that would be a big problem and might be a reason the code would be lean to the point of being buggy, but checking my computer with SentinalOne the backup ELAM file is 17KB so I guess it isn't that big a deal(Makes you wonder why some of our device drivers are so freaking bloated though eh?)
SomewhatHungover@reddit
It's marked as a 'boot start driver', there's a good explanation in this video, and it kind of makes sense as a well crafted malware could prevent crowdstrike from running if it could just make it crash, then the malware would be free to encrypt/steal your data.
IHaveTeaForDinner@reddit
Interesting! Thanks.
OptimalCynic@reddit
Exactly this!
cjpack@reddit
We need to move away from end to end cybersecurity needing to exist in the kernel to work and have it be user level with kernel level access, maybe add a quick debugging step outside of kernel to go heyyy this is a .dat file not a sys, let me correct that before dropping it into the other system files folder and bricking everyone’s machines. Idk though if this is how’d it work, I just read there was some startups that are specifically claiming to solve this issue and vcs are finding them. If it can be as secure and effective as something like crowd strike but way less risk without existing at kernel level then they will probably be worth investing in.
Coffee_Ops@reddit
Kernel level is only ring 0. Can't get into VTL1 with only that.
aheartworthbreaking@reddit
No LAPS?
Ok-Boysenberry6782@reddit
You have a single local admin password?!?!
sssRealm@reddit
To protect against all types of malware it needs to be imbedded into kernel mode of the operating system. It basically gives them keys to kingdom. Anti-virus vendors need to be as trust worthy as Operating System vendors.
HalKitzmiller@reddit
Imagine if this had been McAfee.
Dzov@reddit
Crowdstrike CEO was McAfee’s CTO.
TheEndDaysAreNow@reddit
And his programming crew at McAfee followed him over, warts and all. Remember how McAfee used to brick things?
Dzov@reddit
I’m shocked anyone would use their software.
TheEndDaysAreNow@reddit
Well, it had a new name. That should have fixed it /s
JBD_IT@reddit
Sounds like the board might be looking for a new CEO lol
Texkonc@reddit
CosmicMiru@reddit
The government uses McAfee (now Trellix) so they are trustworthy enough supposedly
Throwaway4philly1@reddit
Doesnt the govt have to use the lowest bid?
Moontoya@reddit
or kaspersky
(zonealarm managed something similar in 2005 - a freebie software firewall that... after a brain file update, stopped _all_ traffic to and from the pc.
that was a fun coupla days @ 2wire
RogerThornhill79@reddit
Tech companies are about as trustworthy as Ambulance chasing Lawyers who are now elected. Concerning.
kirashi3@reddit
I mean, if you didn't verify the code was secure before compiling from source, is there technically any way to actually trust the code? 🤔 To be clear, I'm not wearing a tinfoil hat here - just being realistic about how trust actually works in many industries, including technology.
circuit_breaker@reddit
Ken Thompson's Reflecting paper, mmm yes
kirashi3@reddit
Hmmm idk if I trust that one... 😄
justjanne@reddit
You can't bolt protection on after the fact.
If you wanted a truly secure system, require all applications to be signed, maintain a whitelist of signed applications and enforce strict sandboxing for all of them.
Anti virus software is just checklist-driven digital homeopathy.
BattleEfficient2471@reddit
And it appears in this case both are not.
Crowdstrike just proved they weren't.
RogerThornhill79@reddit
DGC_David@reddit
The funny thing is, it did a little...
TheEndDaysAreNow@reddit
Not at all. You can fully trust them. /s
sagewah@reddit
When it comes to malware, whatever runs first, wins - you want your AV loading before the bad stuff or it doesn't stand a chance.
dualboot@reddit
It's called a rootkit =)
agape8875@reddit
Exactly this.. Windows already has built in solutions to detect rogue code at boot. Example: Secureboot, Secure Launch, Kernal DMA protections, Defender ELAM and more..
DreamLanky1120@reddit
No, no, no, don't set your stuff up right. Far too risky, you pay CrowdStrike, do the one-click installer and then blame them if anything happens to your critical infrastructure.
Only to be informed that there are AGBs that clearly state that you should not use their software on any critical infrastructure :)
It's the way. You could also ask ChatGPT and do whatever it says.
incidel@reddit
THIS!
McBun2023@reddit
In order to kill the malware, you must become the malware
MaximumGrip@reddit
At this point Crowdstrike IS the malware
omfgbrb@reddit
You either die a hero or live long enough to become the villain.
-- said somebody somewhere who isn't me.
McBun2023@reddit
"Know Your Enemy"
- ~~Sun Tzu~~ John McAfee
National_Summer927@reddit
It's a kernel module, that is the "OS"
cjpack@reddit
Yah how can they access the boot drive remotely? I thought this was not possible
KaitRaven@reddit
That's the strength (and weakness) of Crowdstrike. It can look for malicious activity from the moment the system turns on.
whythehellnote@reddit
s/look for/cause
RogerThornhill79@reddit
The people that covered the tracks of Hillary Clinton and her email servers.... Nice. I'm sure we can trust them as much we can trust Raytheon.
uptimefordays@reddit
Do you just like not what cyber liability coverage? Every policy requires EDR because tin foil crown wearers who "don't believe in updates" or "don't need anti varus spyware" got and continued getting ransomware.
baked_couch_potato@reddit
jesus fuck you people never fail to show just how goddamn stupid your beliefs are
charleswj@reddit
How dumb do you have to be to not even get the conspiracy theory right?
thejimbo56@reddit
RogerThornhill79@reddit
Skullclownlol@reddit
Why would you think this is scarier than a kernel-level driver that has access to everything anyway?
Coffee_Ops@reddit
Kernel level doesn't have access to everything on Windows 11.
HamiltonFAI@reddit
The app having kernel level access sure, but that kernel level access can be contacted remotely without the OS is another level.
xfilesvault@reddit
No, it can’t be contacted remotely without the OS.
It tries to update the definitions BEFORE applying them. But it doesn’t wait long.
So if your network is quick to initialize, like wired internet, it will download the updated definitions.
Otherwise, it applies the existing channel update and then crashes.
It’s a race condition. Sometimes it will fix, sometimes it won’t. Bit is not because they have something else crazy loaded on your machine.
It’s just the same kernel level driver that is running the first lines of code. The first lines of code MIGHT SOMETIMES succeed at fixing the issue that causes the crash later on in the execution of the driver.
Coffee_Ops@reddit
Sounds like you don't understand the level of access you give the vendor of your EDR.
Consider Defender if it bothers you.
maggmaster@reddit
As a sys admin this is the smartest comment. All the bad actors are watching this.
Moontoya@reddit
Doesnt that also suggest its pre-encryption?
VintageSin@reddit
Linux admins out here just looking like
Skwalou@reddit
This makes no sense, why would you be scared to give control when you specifically hired them to protect your data? It's like being scared of your bodyguard because he is following you...
CosmicSeafarer@reddit
I mean, if they can do it then adversaries can do it, so wouldn’t you want that?
ChihweiLHBird@reddit
Many Antivirus softwares are running as a kernel module, which is the reason why it could cause BSOD at the first place.
Pixel91@reddit
Yeah but most don't run as rootkits.
crusoe@reddit
Every Intel server has a management engine that runs Minix with full network and file system access. The dedicated port should be on its own segmented network.
AMD servers have a similar feature.
AgreeablePudding9925@reddit
It’s not PRE BOOT but during boot. They load in with the kernel hence they’re there at the beginning of things. That’s how they can do what they do - including breaking things.
MoonedToday@reddit
My thoughts too. This sounds like a vulnerability.
progenyofeniac@reddit
The systems generally get to the login screen very briefly. It’s not a huge stretch that CS would be running by that point.
lilhotdog@reddit
I mean, that’s literally what you paid them for.
AGsec@reddit
But wouldn't that be necessary in terms of total security prevention/detection?
zlatan77@reddit
This ☝️
TheIndyCity@reddit
For real. We had <400 affected and it took us 24 hours to remediate manually, I can't imagine how you do this for your customers who are impacted into the several thousand end points. Huge news if so!
Wolvansd@reddit
Not in IT, but we have about 9000 end users effected being manually remediation by IT. They call us, give us an admin login, directions to delete then reboot. 13 minutes.
My neighbor, who does something database stuff , maybe 2k end users just sent out directions and they mostly self remediated.
Solidus-Prime@reddit
I had our entire company of 2k users up and running within an hour of being affected, by myself. Managed IT services are getting lazy and sloppy.
xfyre101@reddit
i dont believe you did 2k units in an hour lol.. just the fact that a lot of them required multiple start ups.. callin bs on this
tell_her_a_story@reddit
I too call BS. Our IT staffed remediation center organized to address remote users were resolving 300 PCs an hour at peak on Saturday, with 50+ experienced techs using OSD Boot drives. That's one every 10 minutes. Insert drive, F12 for the one time boot menu, select the USB, enter BIOS password, boot into WinPE, enter admin password, wait. Select the advertised task to resolve, Let it run, reboot, login to confirm it's resolved. Takes a bit of time.
LeadershipSweet8883@reddit
If they had it automated via PXE boot or did it like an assembly line, I could see it. You don't have to do it one at a time and sit there watching for 10 minutes. Have a team log into WinPE, set the computer to the side, do the next one. Have another team pulling from the pile to kick off a reboot, goes to the next pile. Have that team check the resolution and shut it down or stick it back in the queue if it didn't work.
xfyre101@reddit
he said he single handedly did 2k computers in one hour lol
xocomaox@reddit
In a perfect setting where all computers are connected to the PXE network and you have easy access to all of them, one person could do 2,000 computers in an hour. But most people don't have this kind of setup (especially in 2024) and it's not because of laziness or sloppy work.
This is why it's hard to believe the 1 hour claim of this person. Had they made the claim without the comment about lazy and sloppy, it would actually be more believable.
tell_her_a_story@reddit
PXE boot requires infrastructure in advance, not something we use. The remote users hardware is assigned to the individual and funded by their department. Stacking them up and running an assembly line to resolve would end up with hardware not returned to the rightful owner. With the shared/generic auto login computers, the techs most definitely kicked them off one after another and went down the line minimizing idle time.
LeadershipSweet8883@reddit
I was pointing out that the other user that did 2k end stations in an hour may have been able to PXE boot them.
The ownership issue is easily solved with a P-Touch label maker or a stack of sticky notes. Not completely necessary but if you are processing thousands of laptops then the throughput boost is probably worthwhile, especially since you can allocate techs based on the current size of the queue for each station.
I saw some places had Bitlocker keys printed on barcodes and inputted using a USB scanner - you can print the commands in barcodes as well.
tell_her_a_story@reddit
Fair enough.
Solidus-Prime@reddit
Like I said - lazy and sloppy.
nantuko__shade@reddit
You must not have bitliocker-encrypted drives.
Solidus-Prime@reddit
We do actually.
I'm 99% sure MS created the KB5042421 article based on my feedback to them:
https://www.reddit.com/r/msp/comments/1e7xt6s/bootable_usb_to_fix_crowdstrike_issue_fully/
nantuko__shade@reddit
That’s a clever solution but you did not create that bootable USB, distribute it to 2k end users, and have them all fixed “within an hour of being affected”. Which btw was approximately 2AM on Friday morning
Wolvansd@reddit
It's all of our own internal IT folks doing it; no contractors.
Work in the utility industry (w/ nuclear) so yah, it's been awesome.
No-Menu6048@reddit
how did u do it so quickly?
AromaOfCoffee@reddit
I've had it take 15 minutes when the end user was a techie. The very same process is taking about an hour per person when talking through little old lady healthcare admins.
narcissisadmin@reddit
Or the hunt and peck person who doesn't get the 48 digit recovery key entered before it times out. Good times.
AromaOfCoffee@reddit
yeah like good for this guy and his ability to follow directions, but that's not most people.
jack1729@reddit
Typing a 15+ character, complex password can be challenging
AdmMonkey@reddit
That probably mean they got a 8 character local admin password that never change...
Ok_Sprinkles702@reddit
We had approximately 25,000 endpoints affected. Remediation efforts began soon after the update that borked everything went out. As of yesterday afternoon, we're down to fewer than 2,500 endpoints still affected. Huge effort by our IT group to manually remediate.
Far_Cash_2861@reddit
Manually remediate? According to George it is a 15 min fix and a reboot.....
FGeorge
tell_her_a_story@reddit
We began remediation at 2am on Friday. At that time, we were booting into safe mode, unlocking the drive via Bitlocker, logging into the PC using a local administrative account with passwords pulled from LAPS ui, deleting the file, then rebooting and logging in using domain credentials to ensure everything came back up.
Depending on how many tries it took to actually get into SafeMode, it varied from 10 to 20 minutes per machine.
By Saturday morning, we had a much more streamlined process to resolve it.
TheIndyCity@reddit
Insane effort, well done
BattleEfficient2471@reddit
Assuming VMs you write a script to mount the disks to another machine and delete the file.
We did this.
TheIndyCity@reddit
Yep that’s how we ended up finishing it off, just took a bit for the script to get the kinks worked out and unfortunately had to deploy it individually to each machine
b_digital@reddit
For VDIs, it’s pretty straightforward to do it quickly, remotely, and en masse with software such as Pure Rapid Restore or Cohesity Instant Mass Restore
HiddenShorts@reddit
400? We had over 2k servers, 17k devices, estimate 80-90% were impacted and manually fixed. We had probably over 200 people engaged at peak on Friday, with likely 150 or so on the ground fixing devices.
lolSaam@reddit
Didn't realise this was a dick measuring competition.
joshtaco@reddit
...this was literally known the morning of the outage. Why is this all of a sudden news to people? I swear, during emergencies, the research portion of IT issues just goes out the door. The only caveat is like they said, a wired connection is recommended as it's basically a race condition against the bug check.
Arkayenro@reddit
that seems like a massive security nightmare knowing that their stuff (and god knows what else) can communicate and update pre/mid boot cycle.
bobsmith1010@reddit
a automated intervention for an issue they caused.
TechManPro@reddit
They reported this to my company as well, but after several machines rebooted 50+ times, we found the manual remediation was actually faster, and more reliable, unfortunately.
NightShaman313@reddit
Does not work if using Global Protect VPN.
Jose083@reddit
They’ve played the notice here
https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/
Fresh_Dog4602@reddit
myea but not really explaining what it is they do.
Jose083@reddit
Why wouldn’t you trust crowdstrike and the hidden stuff they do inside a critical directory of your system?
Let’s hope they passed QA on this one.
bmyst70@reddit
Let's hope they actually DID QA on this one. Their initial update smells like "Developer pushed crap that wasn't even sanity checked before being sent out to the world."
fishfacecakes@reddit
It was supposedly package corruption, which means they do no signing, or, the version they tested isn’t the version they signed. Either way terrible for a security company
BattleEfficient2471@reddit
So they don't QA the finished product?
fishfacecakes@reddit
Yeah it seems like no. Or they do, but then don’t sign that, which seems worse
BattleEfficient2471@reddit
If they sign it, they would need to QA it again.
You should always QA the exact same process with the same files as prod.
fishfacecakes@reddit
You QA the files you’re sending to prod. Then, you sign them to know the same files you’ve QA’d are the ones in prod, unmodified
BattleEfficient2471@reddit
If you signed them, you modified them. Assuming signature is in file and not a separate sig file.
So test again. Unless it exactly the same bytes, test again.
fishfacecakes@reddit
I’m talking detached signature files for this very reason
BattleEfficient2471@reddit
At that point you might as well just supply hashes, I mean honestly they should always be doing that with any file.
fishfacecakes@reddit
If you’re just supplying hashes though, then any threat actor in the chain can sub in their own files and their own hashes. If the client is verifying against a known signing key, signing the files is a much more secure way of doing it.
BattleEfficient2471@reddit
Well if the bad actor can upload files and hashes, he probably has access to the private key as well.
The stories I could tell about developers. You ever end up in Buffalo NY, you let me know.
fishfacecakes@reddit
Sounds like a plan - cheers :)
honu1985@reddit
You will be surprised how many software companies in the world operate without QA. Heck even MS, they don't have QA and rely on dev's unit tests and just push out. They ask devs to write testable codes in the first place but still...
bmyst70@reddit
Apparently not. Nor do they even do a simple MD5 checksum comparison to confirm the update definitions are valid.
You know what even Clam AV does for its virus definitions.
Xalenn@reddit
I'm still surprised that they were able to get WHQL cert for a program that runs outside untested code at that level.
Jose083@reddit
Think it’s because it’s the definition that is getting out breaking stuff but the driver itself is WHQL certified.
I guess the nature of product you can’t wait for WHQL turnaround on every definition file for obvious reasons.
Still a 10 minute QA stage of this would have caught the problem.
HerbOverstanding@reddit
They are simply quarantining the bad file. Sigh if I had the forethought then would’ve just created an IoC for that hash. I imagine though that there probably is perhaps more to their method beyond simply IoC hash blacklist.
Dickbluemanjew@reddit
So you mean to tell me that now you idiots are giving this company free rein to have remote access? Lol. What could go wrong.
stulifer@reddit
Desperate times
Dull-Sugar8579@reddit
My thoughts in this whole thing.
EmicationLikely@reddit
Yeah.....too late, Crowdstrike. Everyone affected has already started (and likely finished) manual remediation. My comment in that meeting would have been "OK, show me." There is no way this should have reached the number of endpoints it did. Either thye don't have good procedures for staged rollout of updates, or someone was allowed to go around that process. Either way, it shouldn't have gotten past internal testing or a small, early-adopter group of endpoints. Prove me wrong.
Least-Music-7398@reddit
I found a CS article validating this post is not BS. Sounds like good news for impacted customers.
Taboc741@reddit
Can you post that article?
kuahara@reddit (OP)
I asked during the meeting for a publicly accessible info page on this and they led me to their 'blog'. This was the best that was provided. The green box at the top alludes to it. I believe there's more specific information locked behind individual customer logins.
https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/
Nightcinder@reddit
One thing I can't stand about CRWD is the fact that all documentation is locked behind paywall
Bernie4Life420@reddit
Redhat too
BloodyIron@reddit
Redhat is locked behind a loginwall, not a paywall. You can create free accounts to get to almost all the documentation (if not all?) while spending literally no money nor any blood of the innocents.
TechGoat@reddit
Yeah, Commvault (our backup provider software) switched from public free for all to 'accounts needed' for most of their docs a few years back. When I told them it made it kind of annoying to share my findings with the members of my team that aren't directly involved with commvault and therefore don't have accounts, they apologized and said it was to cut down on scrapers
BloodyIron@reddit
lol and what problems exactly do scrapers cause? And have they not heard of robots.txt? That's silly of them to do, but I hear you. Yuck.
Rare-Page4407@reddit
a lot of spiders ignore robots.txt
BloodyIron@reddit
lol and what problems exactly do scrapers cause? And have they not heard of robots.txt? That's silly of them to do, but I hear you. Yuck.
nappycappy@reddit
that's bs. there are information I've looked for for their stupid idm that is unavailable even with a basic login.
BloodyIron@reddit
Mind providing some examples pls?
nappycappy@reddit
well shit. . I guess I'll have to take that bs comment back. I just signed up for the developer account from a link here and now it lets me see the ones I have been looking at in the past.
BloodyIron@reddit
Well I can't speak to the ones that gave you problems in the past. For all we know, that could have been a bug :) But here's to you for trying again! nice! :D
broknbottle@reddit
No it’s not. You just need to sign up and enable the no cost developer stuff.
Advanced_Vehicle_636@reddit
Red Hat does not require a paid subscription for any of the documentation I've read - and I've read a stupid amount of RHEL documentation over the last few years. RHEL only requires you to login. You can do that with a free dev subscription.
I got my RHEL account the same time I got my development subscription which was completely free and came with no requirements to buy RHEL. Though to be fair, we have a paid RHEL subscription now, so it'd be hard for me to tell at this point.
FWIW: I think it's marginally less stupid they login-lock their documentation [then pay walling it], especially considering CentOS and Fedora documentation is nearly as applicable (... and free ...) as RHEL documentation is. But it's still stupid.
Also: RHEL documentation in my experience is usually extremely handy. If you don't have an account and work with RHEL or derivatives (incl. Fedora, CentOS, Rocky, Alma, and Amazon), I'd highly recommend getting an free account.
pizzalover101@reddit
I signed up for the red hat developer program (16 licenses for free) and have not found any documentation locked away behind a paywall.
https://developers.redhat.com/about
Hotshot55@reddit
You don't need an active subscription to read RedHat's articles, just have to sign in.
BondedTVirus@reddit
Depends on what you're looking for. I encountered "subscription required" just last week. 😩
thejohncarlson@reddit
SentinelOne has entered the chat.
Nightcinder@reddit
s1 locking sentinelsweeper behind support pisses me off
lordmycal@reddit
But also understandable since it could be used to remove S1, which is something adversaries have a vested interest in.
Nightcinder@reddit
You need to be in safe mode anyway; makes no difference.
Sweeper doesn't even work in my experience, I had to do it without the app
wilhelm_david@reddit
security through obscurity is no security at all
technobrendo@reddit
90% of "enterprise" software did too
DarthPneumono@reddit
RedHat's documentation is free, but requires a sign-in.
hornethacker97@reddit
Red hat has never locked their documentation behind paywall, and in fact they cannot be open source and also lock their documentation behind paywall.
Rare-Page4407@reddit
/r/confidentlywrong
MrHaxx1@reddit
Why not?
ByTheBeardOfZues@reddit
Yeah I've always been able to access documentation. I have had to log in for solution articles though.
R8nbowhorse@reddit
That could not be further from the truth.
EWDnutz@reddit
Yup. I've noticed the same for a lot of platforms and it's terrible.
At least make health/status pages publicly viewable....
TechIncarnate4@reddit
Why is this an issue? The product is behind a paywall. If you pay for the product, you have access to the documentation.
cassiopei@reddit
Unless your password servers are in a boot loop due to a bluescreen.
Sure, eventually they will get the credentials and pass them around, but why make it extra hard to access the support documentation for a group of people that may be affected.
QTFsniper@reddit
The techie / knowledge seeker in me hates this but the counterpoint I could see is “ if you want to see how and read how our stuff works - be a customer, pay, and support us “ and I could kind of get it , even if I don’t like it. I could see bad actors using it for knowledge or just them saying buzz off , you’re not our customer.
Definitely am not supporting the practice but just curious on what others think about it regarding the validity of that mindset.
Ok_Fortune6415@reddit
Why would I pay before seeing how and reading how your stuff works? That’s makes no sense. Yes, let me become a paying customer based on sales buzzword vomit.
QTFsniper@reddit
Probably how they get you to set up a time limited trial account , sit through sales calls and demos to find out more.
chkltcow@reddit
Making me sit through sales calls and demos to get even the basic information about your software is the #1 way to make me NOT be a customer. This is a terrible idea.
independent_observe@reddit
That's the IBM way
QTFsniper@reddit
Of course its a terrible idea , never argued that point. Btw, I'm not a part of any sales or company that does tech services.
spacelama@reddit
It's the kind of thing that makes me take shit off my CV though. I prefer working with open technologies where I can actually research and fix any problems that I encounter without vendor encumbrance.
i_am_fear_itself@reddit
You asked for this because you recognized the importance of this meeting and knew before the words came out of your mouth that you were headed right back to this sub to share with those who are still burning the midnight oil what you learned.
I'm not sure there's a finer example of the spirit of this sub. Well done, lad / ladesse.
flatvaaskaas@reddit
Hmm i only read that they have a new way with an opt-in. But no explanation what this is, or how it works?
Do you have any other information about this?
daweinah@reddit
I can confirm. My CSM jumped on a Zoom a few minutes after I asked and gave me specific language to put in a ticket with Falcon Complete. A few hours later, Cloud Remediation was enabled on my hosts.
Least-Music-7398@reddit
Does it work? 100% effective on all? Require multiple reboots or did all the BSODs need a few minutes to take the fix then a restart?
traydee09@reddit
So this must then happen after the network stack is loaded and activated, but is it WIRED specific? Will it work for users who are at home and on wifi?
Big-Slide7304@reddit
By any chance is that CS Article searchable? I searched for cloud remediation, automated remediation but can't find it. Either way I've opened a tech support ticket to get information on opting in for automated remediation / cloud remediation. I'm a little worried though because they are so swamped and won't get to my generic ticket since I don't know the exact steps I should be following and just opened a general ticket.
Least-Music-7398@reddit
https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/
MrStealYo14@reddit
have a link for that?
rose_gold_glitter@reddit
This sounds a lot like the reason Microsoft was suggesting 15 reboots - each one edges you closer to a download of the update needed to fix it.
CopperKing71@reddit
Giving AV vendors access to the kernel is what started this whole mess….
LamarLatrelle@reddit
The disease and the cure.
ArmedwWings@reddit
This is basically what we did using ConnectWise Control. Queued the command in the portal and then restarted the device. Sometimes took a couple tries but worked really well.
Also, using invoke-command locally works too.
dpdpowered83@reddit
Why didn't CrowdStrike do this to begin with?
Six_O_Sick@reddit
So how is this supposed to work? Network Connectivity loads before the faulty driver, checks for updates and fixes itself?
sockdoligizer@reddit
It’s still a race condition. The crowdstrike agent loads at boot and does many things. Two of those things are checking the cloud for updates, and validating all of the content modules it already has. If the agents checks the cloud, gets the update, and applies it before attempting to load the faulty module, it will get fixed. If the module wins, you keep blue screening.
To everyone saying why didn’t they release this Friday, they didn’t have this available Friday.
To everyone else, crowdstrike did have this available Sunday evening. I know because my rep told me about it and I sent it to infra teams in my organization. I do t know why people are having to meet with their reps to gets answers.
Is this the same poster that got fussy over the weekend that he had to hear about crowdstrike news from some engineer on twitch? What a guy
hebuddy69@reddit
they didn't have it available on friday, of course, only the following week when everyone had already gone to hundreds if not thousands of machines and racks manually to get fixed up.
kfelovi@reddit
We worked on weekend, they didn't.
Unable-Entrance3110@reddit
Yeah, something isn't right here. This seems to me that CS is purposely making it seem as though the remediation was onerous and risky hence all the pseudo-legal opt-in BS and delays.
The functionality was already there in the client to delete any file they want (obviously) and I would be willing to bet that several employees pointed this out immediately. But CS doesn't want to advertise this fact or wants to make it seem very difficult to do so as not to open themselves up to even more scrutiny and possible liability.
BalmyGarlic@reddit
You would think they would have an IR plan in case something like this happened. I'm guessing they don't or their IR plan is to coordinate all communications through their insurance, who advised them that meeting with reps and requiring opt-in was the way to go rather than blasting out the message on all platforms and/or requiring customers to opt-out.
Infamous_Sample_6562@reddit
My client’s legal department is compiling all of the overtime they had to pay us to remediate. It’s not going to be cheap. About 14k out of 80k endpoints were affected.
dragon788@reddit
According to a lawyer who read their ToS the most your legal department might get out of them is a refund of fees paid, unless they are willing to bet a lot of money and time on a lawsuit and convince a judge in East Texas they'll get a cut of the proceeds.
Infamous_Sample_6562@reddit
They have a massive legal team.
SpetsRu@reddit
Yep, sounds about right... That fine print gets you every time..
rdhdpsy@reddit
lol fuck we manually fixed 2700 servers mostly by the azure vm repair command so not that bad, but we had a bunch of osprofile issues which the script couldn't deal with.
maxcoder88@reddit
What did you use as a script?
rdhdpsy@reddit
if you are in azure there is a specific az cli vm repair command just for this issue.
Nik_Tesla@reddit
That's really good news... but it makes me sad about all of the IT folks who absolutely killed themselves this past weekend to do it all manually. Especially those on salary that are just going to get a pat on the back and a starbucks giftcard at most.
The best thing about this is, all those devices that are with remote users or at some far away location, that they weren't able to get to yet, can be fixed. I was thinking this was going to drag on for weeks with the last 10% of devices at each company taking a long time to physically get to.
Pork_Bastard@reddit
Not everyone works for a slave driving shithawk. My firm wouldve paid full overtime and fully given us major props! Luckily we werent using CS as our EDR but theyve proven themselves many times including a major breach 5 years ago. Dont put up with assholes
Nik_Tesla@reddit
Not everyone, but clearly enough work for slave driving shithawks, considering this is the top of the subreddit right now
https://www.reddit.com/r/sysadmin/comments/1ea9lpr/so_who_else_is_looking_for_a_new_job_after_how/
bageloid@reddit
They just pushed this to all clients in the US-1, US-2 and EU tenants.
poorleno111@reddit
Yeah, just saw that. I think this will probably what gets us to move on from them even more. We didn't want their fix as we don't trust them, then they just push. I'm hoping our legal team comes down pretty hard on them.
VedantaSay@reddit
How do you qualify best?
2Ks@reddit
Spare_Conference3467@reddit
markdacoda@reddit
https://www.youtube.com/watch?v=XrrryadgchI
kirashi3@reddit
I live for Shitty Internet Mashup Music Videos™ (aka soundclowns) so thank you for sharing - this made my evening.
CTeeO@reddit
From my CS Rep:
As you noted a fix was shared on our support pages where we requested Customers to opt-in. Following extensive testing, this fix was subsequently deployed across all Customers without the requirement to opt-in. In most cases this means the affected hosts only require rebooting for the remediation to be complete. An update announcing this change was pushed out at 2237 UTC on the 22nd July and can be found at the following page: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/
BattleEfficient2471@reddit
The same extensive testing done on the file released friday?
xpkranger@reddit
This is for end users? Yeah, they all lost their Ethernet dongle a year or two ago.
LNGU1203@reddit
Every IT environment is different. Yours must be all cloud vms and such. For hybrid or on-prem, how? LOL
BattleEfficient2471@reddit
If they are on prem VMs, you write a simple script.
Power off BSODing VM. Mount its disk to another machine. Have that machine delete the file.
Tada. No human interaction needed. You can do it all in powershell/powercli for vmware.
BattleEfficient2471@reddit
Only many days to late.
I still want to hear how this was possible, since it was, why didn't any QA process catch it.
Far_Cash_2861@reddit
CS...QA Process.....
LOL
BattleEfficient2471@reddit
This is what happens when execs are allowed to pick software.
Mind you most of the security software is a total joke. Looking at all you companies wanting me to turn off selinux to run your rootkit.
bebearaware@reddit
So just to go over this process
Voila.
And Crowdstrike aren't explicitly talking about disabling fast boot?
RoadRunner_1024@reddit
wrong, this new "fix" uses crowdstrike to quarantine the affected sys file. so if the pc can boot far enough to get the list of IOC's before the system crashes, then crowdstrike quarantines the file and the issue is fixed. no scripts involved. no connections to the internet (other than connecting back to crowdstrike which has always happened.)
sounds great in theory... in practice I'm not seeing much success
bebearaware@reddit
Yes, connecting to the crowdstrike servers counts as connecting to the internet.
Michagogo@reddit
My understanding is that it’s not a separate service, it’s the regular agent going through its startup sequence. Part of that is establishing the connection with the backend, and going through the various communications/checkins that entails. One of those is checking for new content updates, which is why even before this new development it was possible that it would win the race and fix itself before the crash. This new remediation method uses a different type of command that gets pushed down at an earlier phase of establishing communications, so it has a higher chance of winning the race.
bebearaware@reddit
I wonder what those dependencies look like.
Dzov@reddit
KaitRaven@reddit
The network interface would be protected as soon as it comes up. Otherwise Microsoft has a gaping security hole on their hands.
rastascott@reddit
Someone should tell Delta Airlines about this option.
frankztn@reddit
Honestly, I think it was created for them(among other big infras). My wife works at Delta and there is literally not enough IT guys to even service her laptop at a moments notice let alone manually remediating all of their workstations. That's probably in the billions that Delta has lost because of this.
kungfu1@reddit
Man, no kidding.
BigToeGhost@reddit
Did they communicate this method? My company had 1,117 servers not pinging and everyone of them had to be touched.
LucyEmerald@reddit
Yep they are letting csagent eat itself then auto repair, just raise a support ticket. Although how it works has nothing to do with Crowdstrikes position in the early startup processes, your fighting for a race condition so that the TCP/up stack launches
Cauli_Power@reddit
Still can't see this working with anything other than a clear tcpip connection. Network auth isn't loaded that early in the process.
LucyEmerald@reddit
That's why it's a race condition, the whole world has been rebooting their devices crazily and people are beating it
Cauli_Power@reddit
I've done lots of work in Windows PE, Linux and PXE. and there's a LOT of stuff that has to happen before Windows can communicate over the TCPIP stack. Regular Windows 11 with all the options has a bunch of kernel drivers, a bunch of non kernel drivers and then a firewall. Anything using wifi or enterprise auth has to load, frequently from the current user space, login, get DHCP and then apply any tertiary traffic rules like proxy, etc. All that loads AFTER the kernel drivers.
I have a hard time believing that Clownstrike is somehow able to bypass all that via some kernel shim that they cooked up. Even if they did it would hose the OS later on when it looks at the adapter and finds the cs process using it.
Maybe I'm a little behind the times but a magical driver that recognizes every nic or wifi adapter in existence and loads before the network stack then creates a socket to their update servers seems a little unlikely unless the created a mini preboot environment compiled at install time.
That IS possible but wouldn't have gone unnoticed....
LucyEmerald@reddit
There's nothing to talk about, go read the work done by Microsoft and Crowdstrike what I explained is presently happening
Dracozirion@reddit
This
photinus@reddit
Based on the feedback I've seen it's about as hit or miss as the reboot repeatedly and hope for the best route.
medievalprogrammer@reddit
Ya, we enabled it yesterday as we have like maybe 40 systems left to fix and I don't think it worked for any of them.
sabstandard@reddit
It has been hit or miss for us as well, have had better luck with PCs that don't have to VPN in, we have always on
pr0t1um@reddit
Bwahahahahahhahahahahahahab...etc.
Jtrickz@reddit
Legal teams are gonna be all over crowdstrike if they lost this much time and they had a cloud fix they could have deployed… this seems a bit backwards…
Secret_Account07@reddit
We are mostly fixed now, but this is incredibly helpful info. Sharing internally.
Good on you for posting this. Also, fuck Crowdstrike.
Far_Cash_2861@reddit
upvote just for the "fuck crowdstrike."
more specifically, fuck george. He was CTO at McAfee and did the same thing 10 years or so ago.
Secret_Account07@reddit
Yep. So this issue has been described in length so I won’t go into that. But, today I realize below:
Prior to Friday Crowdstrike had no process to remediate bad content files that crashed OS (kernel) boot. If they would have thought of this prior to Friday, it wouldn’t have taken them 3 days to leverage a cloud based solution. So even looking past the lack of qa/testing…what was the plan BEFORE Friday if you released a botched file that crashed the kernel. They know full and well their driver is loaded by kernel and references all updates/content files. You DONT give the customers the choice to stage content updates (test env, prod, dev, etc.) so wouldn’t it be figured out by anyone with half a brain you need a process if you brick the kernel/boot process?
So I will give them credit they are owning this specific fuck up (kinda), but what was your game plan prior to Friday for this kind of issue?
Alternative-Wafer123@reddit
Has their solution be tested as well?:)
Far_Cash_2861@reddit
You are asking if CS did QA testing.....
LOL
BitOfDifference@reddit
So why wasnt this posted on the day of the outage? Could have saved people a ton of time and weekend work. To post a fix 4 days later is only going to help those still down, everyone else has already spent their resources and had downtime.
thepottsy@reddit
We were advised of this earlier this afternoon, but by that time, it was kind of a moot point as we had already remediated well over 90% of systems.
They SHOULD have simply just implemented this during the day on Friday, without the silly opt in bullshit.
kuahara@reddit (OP)
While I completely agree, I'm guessing after a screw up this big, they were real nervous about mass releasing anything else to the world.
thepottsy@reddit
Fair, but they already did when they replaced the sys file shortly after the fuckery.
what-the-puck@reddit
Sort of - that was documented functionality. Effectively a definition update.
This, not so much. This is a new process, intervening early in the CrowdStrike startup process and deleting files.
thepottsy@reddit
I truly do understand that. I’m simply saying that they apparently have this capability. Why are we only hearing about it today? Over 72 hours after the shit storm.
codewario@reddit
Apparently this is something they cooked up in response to the outage, and is new functionality. That's what I'm being told about why it wasn't made available sooner; it took a few days to get the remediation written and tested.
crankyinfosec@reddit
Careful asking questions like this will get you downvoted by crowdstrike employees. My CISO made the call after this news that we're not renewing and will be transitioning. This will get you down-voted also. I used to work for 2 AV vendors, I have friends across this space and several at crowdstrike. Apparently people have been linking to 'problematic' comments on reddit so people can 'manage' comments.
Cmonlightmyire@reddit
I mean crowdstrike is literally bundling "URLs that magnify negative sentiment" with actual malicious URLs so... yeah its been frustrating to deal with them
thepottsy@reddit
Nice lol
SimonGn@reddit
To me this is the worst part. Not even a note "We have a potential method of fixing through a cloud update which runs before the crash, if you can wait a few days or weeks for us to develop and test this method, you might want to hold off on fixing those hosts manually if you can wait for the automatic fix"
Unable-Entrance3110@reddit
Except that the point of a definition update is to attempt to identify malware with the point of obliterating it. If that malware was in the kernel and obliterating it would have caused a BSOD, that would be considered CS working as intended. Why does the source of the malware make any difference?
KaitRaven@reddit
Crowdstrike is designed to track unusual activity during startup and interrupt it if needed, so that aspect is not too surprising. It is interesting to know just how early the agent is communicating with the servers though.
drnycallstar19@reddit
Correct, exactly my point. This could have saved us a shitload more manual work.
Especially how simple their fix is. It’s not complex at all. Simply doing automatically what we’ve had to do manually over the weekend.
sm00thArsenal@reddit
Yup, the fact that this is possible but it took them nearly 4 days to release and even then as an opt-in is almost worse than it not being possible.
BalmyGarlic@reddit
Or if you're going to require an opt-in then blast it out to every client in you system via email and robo call to get those opt-ins or direct people to where to do it. Also instruct your call center to do the same thing, to get the clients without working phones and email back up. Also post the instructions on your website and blast it out via social media.
There are much more efficient communication methods than scheduled meetings...
drnycallstar19@reddit
Yeah I was thinking the same thing. Not sure why It took them 3 days to release this “fix”. Doesn’t seem like such a big thing to implement.
codewario@reddit
I'm confused after reading this: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/
I'm not understanding this. Rebooting 15+ times was already said to help by MS. I guess this "cloud remediation" opt-in thing makes it more likely for the reboot to give enough time for the fixed definition to be updated according to this thread, but I don't see anything about "cloud remediation" except for how to recover nodes on AWS, Azure, and GCP. I don't see anything about what is stated in this thread on the remediation page published by CrowdStrike.
Fallingdamage@reddit
Is this a new feature or did CS just wait 5 days before making this more apparent to its customers while they ran around losing their minds trying to do this manually.
When the remediation instructions were released. Why wasnt this mentioned?
Far_Cash_2861@reddit
crowdstrike is in for a ton of lawsuits. Contractual double speak will not protect them from negligence.
Desperate-Tip6702@reddit
We’ve been doing this manually on all of our machines smh but have been running into issues bc some machines are secured with the bit locker key so you would need to run the key unlock cmd before you run the del command
Ok-Garden1663@reddit
Reboot my cruise I couldn't get to.
PweatySenis@reddit
Please tell me you got reimbursed or at least a free reschedule
slyboon@reddit
Management here decided not to opt in. We supposedly had another 750 machines or so left to remediate at EOB yesterday and should finish up today. Guess they decided not to trust Crowdstike but it would have been nice if some of these would have been done.
Oh well
IamPun@reddit
By the time you are rebooting it 3 times, I would be already done fixing it with Microsoft CrowdStrike remediation bootable utility.
kuahara@reddit (OP)
2000 users all rebooting their own computers is going to happen a hell of a lot faster than you running around with a bootable remediation tool
joshbudde@reddit
We were already having success with just having users connect their computer to a wired connection and reboot continuously until it straightened itself out. It started working on Friday after they pulled the update, sometime in the afternoon (Eastern time).
zmaile@reddit
Am I the only one who sees this as an intentionally hidden backdoor that is being used 'for good' in this particular instance? I'd be curious to know if this feature is heavily obfuscated in the binaries.
Aeeaan@reddit
Crowdstrike doesn't need a back door as they've already been handed the keys to the front door.
Obvious_Mode_5382@reddit
Exactly why supply chain hacks are a thing, eh?
cvsysadmin@reddit
Yes, you are the only one. All it does is quarantine one of their own files. They (as illustrated over the past few days) already have system/driver level access. There is no secret there.
DenverITGuy@reddit
How is this different than the original "reboot up to 15x" fix provided on day 1?
What about the opt-in program makes that more reliable?
KaitRaven@reddit
I think that depended on the normal Crowdstrike Update process replacing the file, whereas this is an explicit command to remove it. Probably works a little faster as a result.
watchthebison@reddit
We got offered this earlier today and you’re right. Was told by an engineer the quarantine of the file has a higher priority than fetching new channel-files, resulting in a higher success rate.
Decided to sit on it because we are nearly fully operational again through the manual fixes and the # of clients they quoted having remediated automatically was much lower (at the time they offered it). Felt a bit risky.
knifebork@reddit
THIS might be a good reason for requiring an opt in. If someone has already been adequately repaired, it could be dangerous to do some other intrusive changes.
throwaway9gk0k4k569@reddit
It's not. OP is just dumb.
Avas_Accumulator@reddit
Thanks, seems 1% of our machines show up as "Down under Portal -> Next gen SIEM > Hosts possibly affected..."
I'll ask CS to add our CID as described. The hard part in 2024 is finding computers with cabled network possibilities
Alternative-Wafer123@reddit
Has they tested this solution too?
solracarevir@reddit
This was their fuck up, why the user needs to submit a ticket and ask for a fix? apply it to every customer and you would have fixed this fuck up in a few hours.
Incredible!
Haru24@reddit
Microsoft recommended day of to do a 15 cycle reboot. It works on the same premise as this, but relies on a fast update of the crowdstrike process. I am happy to hear that crowdstrike is expediting that process because it could take 4 reboots or 15 reboots and with blue screens in the middle, it was a slow process. Manual remediation was faster. We only used it for off-site computers where we knew having them do it would restore faster than we could get onsite.
Wendals87@reddit
Why the heck is this opt in? Just blacklist it for all and push a new update
poorleno111@reddit
Probably because legal / risk is involved in a lot of their decisions at this point.
anna_lynn_fection@reddit
Nothing like worrying about the consequences of playing with fire when you're already fully engulfed.
Unable-Entrance3110@reddit
Yeah, "we better not light a match, we could start a fire" says the company whose house is engulfed in flames.
Turtledonuts@reddit
Because some guys at a government agency are currently panicking about the idea of a random company being allowed to remotely edit critical directories in all their endpoints during startup.
randomdude45678@reddit
If all you have to do is opt in, they’ve had this ability all along and will moving forward
kirashi3@reddit
Ah, yes, the same people who deliberately installed the same software that already has this functionality. The logic behind some institutions will never cease to amaze me.
hotfistdotcom@reddit
why are people JUST hearing about this late on monday? Like that's good, but also, that absolutely fucking sucks for the millions in labor pissed away over the weekend.
barkingcat@reddit
cause you needed the support contract to hear it from your rep.
This is the last chance Crowdstrike has to make money from a big portion of their clientbase, might as well charge people for the fix if they're likely to not renew.
sol217@reddit
...can you even use crowdstrike without a support contract?
hotfistdotcom@reddit
My god, seriously?
I am so happy I we turned down crowdstrike.
hebuddy69@reddit
it's literally a dogshit av, the only thing our security team likes about is fusion workflows and the ability to hone in on specific assets with advanced search event queries.
we're moving to Elastic thankfully, wont be until end of year however
Unable-Entrance3110@reddit
If they had this capability the whole time why would they wait 3+ days to offer it? Something is fishy here.
Defeateninc@reddit
Thank GOD!
I am going to call my rep right now. After doing the 2000th machine manually I am DONE!
Dull-Sugar8579@reddit
I hope your not paid hourly.
ReanimationXP@reddit
Nice of you to post this in sysadmin instead of /r/CrowdStrike..
BattleEfficient2471@reddit
You mean a place where it may well be removed by CrowdStrike employees?
phillymjs@reddit
I don't know about the rest of you, but most of my users act like I'm asking for one of their kidneys when I ask them to connect to a wired network. And the home-based ones seem to bury their routers in the most inaccessible areas of their homes.
hsoj700@reddit
This + company laptops that only have USB-C🤬
AromaOfCoffee@reddit
lets be real, you have to try to find a laptop with ethernet these days.
tisti@reddit
Is it really an issue? Usb-c docks usually have all the required bits.
BalmyGarlic@reddit
I hear you. At a previous job, we refused to assist with numerous issues if the user wasn't wired in once we identified WiFi as a culprit. It was in the remote work agreement so users had to agree to be wired in to work remotely (hello COVID). Had a lot of push back, especially from management, until management did it and realized that this solved so many of their issues. Took months of us holding our ground during the pandemic but we reduced our Incidents by probably 80%. Turns out when you get rid of the numerous RDP issues with working out of an AVD farm over WiFi, things are pretty damn stable.
wrootlt@reddit
In our office some PCs after initial bsod would reach some sort of working condition. On one of them our security team actually successfully wiped bad sys file using CrowdStrike EDR. And maybe this cloud solution is mostly for physical machines? Our VM servers would crash so quickly i cannot imagine this solution would have enough time to work.
kuahara@reddit (OP)
You can already load the iso onto your pxe server and net boot all your virtual servers to run that. Server remediation can happen en masse. I actually wrote a tool to do it Saturday, tested and confirmed that it works. Our guys at the agency also confirmed it was working. Microsoft released almost the exact same thing Sunday morning.
Microsoft publication: https://techcommunity.microsoft.com/t5/intune-customer-success/new-recovery-tool-to-help-with-crowdstrike-issue-impacting/ba-p/4196959
Direct download link: https://go.microsoft.com/fwlink/?linkid=2280386
I have not updated mine for bitlocker, but Microsoft's already includes that. If you don't use bitlocker and want to use mine, I can PM a google drive link.
I let this go since MS has the trust and bandwidth to distribute this farm more efficiently than I can. My tool is 377MB.
JustInflation1@reddit
Why are you working on a Saturday? I hope you’re in for a big raise.
kuahara@reddit (OP)
I was planning on putting out a blog post on the tool and monetizing it with an ad or two, maybe a donation link. Plus just seeing if I can be the first one to do something like that is enough Saturday motivation for me. I was worried that a gazillion people would download it at 377MB a pop and either take the site down or I'd get billed for bandwidth. No need to bother with it now.
Plus trust is a much smaller issue with it coming from MS.
tisti@reddit
Host it as a torrend/magnet link if bandwidth is a concern.
TaiGlobal@reddit
You should still publish it. I haven’t read your solution but it will be useful again in the future. Almost every year for the past few years I’ve ran into issues with updates preventing boot or even login. My last environment we disabled automatic repair for some reason and that caused a boot loop on like 60 desktop, I’ve seen a bad Citrix scrub tool prevent login and a bad update that caused anyconnect to fubar the nic and prevent network connectivity. We had no local admin and in all these scenarios we basically had to reimage 20-40 machines in each incident. If I could just pxe and uninstall only the offending app that would have saved a lot of time than a complete reimage. Also ppl had data saved locally id have to manually recover before I could reimage. So that took up time too.
JustInflation1@reddit
Good man, you’ve got to be thinking about you and yours because you know damn well the boss isn’t going to be
Nuggetdicks@reddit
The fuck?
JustInflation1@reddit
Yeah, you’re right. Let’s all work for free. Surely the boss will notice. Lol how long you been in this field bud?
Nuggetdicks@reddit
How long you been online? Heard the news? You think the company is gonna wait until Monday and “sit it out”?
JustInflation1@reddit
Alrighty bud go ahead and be the hero. Don’t expect anything from the company. You gotta start playing capitalism that better you’re gonna end up losing. You never played monopoly as a kid?
Ok_Fortune6415@reddit
Talk about yourself, but my “being the hero” has gotten me 50-70% bonuses on top of my already very good 6 figure pay.
JustInflation1@reddit
Well, if you’re telling the truth, you really have to understand that that’s not most people and that’s not the American way. At least not for the past 20 years, so statistically you made a bad move. Actually many bad moves from your description. And honestly, going beyond statistics You’ve probably worked yourself down to a low amount of money per hour. Not to mention all the stress and late nights that you’ve put into your body. I really hope you don’t have any problems later down the line, but this is no way to treat yourself.
Ok_Fortune6415@reddit
I don’t live in America. For where I live, I’m in the top 5% of earners. I don’t think I made many bad moves at all.
JustSomeBadAdvice@reddit
So, just to confirm based upon 6 responses you've given to other people, you have, in fact, been living under a rock since at least Friday.
Kritchsgau@reddit
If not you’re in for a fun Monday, i call that a resume generating event.
coreycubed@reddit
Have you been living under a rock for the last week?
fivelargespaces@reddit
I used MSFT's tool. It asks if you have bitlocker or not.
wrootlt@reddit
I know. But we have only around 20 Windows servers under my team and i had to fix 15 or so via Safe Mode with Networking. In total probably took me an hour or so.
Soundcloudlover@reddit
Thats a huge improvement.
agentfaux@reddit
Such an amateurish company. No idea why anyone uses them.
KikiITgirl@reddit
This would have been nice to know several days ago , after on site remediation nightmare and then remote users who know crap about tech being talked through safe boot , and given command line commands for the first time ever … and then learning some of the machines had domain issues as a result , and wouldn’t take the local admin creds and bricking the remote ones until figuring a pin hole reset would let us proceed .. ahhhj ughh , meh , but thanks for the experience on how to hustle and the overtime …
AnomalyNexus@reddit
Breaking it was opt-out, Fixing it is opt-in
NetworkITBro@reddit
China is really getting annoyed with all the data they’ve been missing the last couple of days.. this CrowdStrike rootkit that pumps it all right to them needs to do better!
Slight-Brain6096@reddit
I'd be really interested to see how this works.
Nnyan@reddit
This has been reported since yesterday. It was effective in almost all the remaining endpoints (less effective on WiFi connections). But there were a small number that had to be re-imaged.
Goetia-@reddit
This should've been published within 24 hours. Great news, but just further demonstrates how hard Crowdstrike dropped the ball here.
anna_lynn_fection@reddit
Exactly my thought. Why are we hearing about this option 4 days later?
Doso777@reddit
The one dude that actually knows his stuff was on holiday?
iamamystery20@reddit
Yeah we got this too but couldn’t understand how is this different from cs updating the file fast enough during the boot loop so skipped this option.
Doso777@reddit
This might be slightly quicker since it doesn't need to get a file from the internet. So in theory it could have a better chance to do its thing than the other method.
LForbesIam@reddit
We have a lot of people where you need a float plane to get to them so this will be good if we can get them engaged.
However my question is why did it take 4 days to come up with this?
I am still thinking it isn’t going to work because it bluescreens the second the network stack kicks in but it is worth a try.
JaMMi01202@reddit
Is it just me that's thinking - this is another potential fuckup waiting to happen.
Ok - so now your machines are checking with this proven-low-quality vendor's cloud service for updates on every boot up, so if they ship something broken, AGAIN, everything dies again - unless they ship a fix AND your users (or you) conduct (potentially multiple if on wifi) reboot(s)?
The long-term fix here is to remove this solution from your machines, surely...
8XtmTP3e@reddit
But to remove it, you need booting systems. If this is a quick way to at least get functional, then why not. Doesn’t mean you can’t immediately have Intune, SCCM, GPO or whatever uninstall Crowdstrike
zxyabcuuu@reddit
Which URL und Port needed for Firewall?
martrinex@reddit
This is good news but their botched update file is effectively a virus, you don't have to opt-in to remove other viruses.
GoodCannoli@reddit
They’re just telling people about this now? Not on Friday?
TransporterError@reddit
10,000 man hours later…
musicman76831@reddit
And trillions of dollars in lost revenue. What a fucking joke.
Cupspac@reddit
Doesn't work with TPM 2.0 and UEFI FSs :)
FourEyesAndThighs@reddit
We are seeing very little success with home users that have crappy WiFi. Apparently their internet connections are not getting established quick enough to be effective. Multiple reboots don’t necessarily help.
BOBCADE@reddit
Wow Crowdstrike to the rescue /s
ecar13@reddit
Conspiracy theorists be like: - Dr Norton invented the computer virus so he could sell antivirus. - The same lab that made Covid made the vaccine - The same company that brought down 8.5 million computers made the fix
KnoBreaks@reddit
If you’ve ever heard the story of John Mccafe it’s not that far off 😂
RobertBiddle@reddit
🤨🧐 What I suspect is coming: "Subscribe to our new 'Cloud Remediation™' , for a small price increase we will handle the hard work and save you time the next time we brick yo shit."
PlsChgMe@reddit
But only if you opt-in and agree not to recoup your losses by suing us. . .
ThatThingAtThePlace@reddit
I bet legal departments would be very interested in what the opt-in agreement states. I wouldn't be shocked to see a clause that states you release crowdstrike from any past or future liability they may have for damages caused by the initial outage or their remote remediation.
tom-slacker@reddit
Huge if true.
Gargantuan if factual.
Titanic if non-fiction
eNomineZerum@reddit
I'd like to opt in, but me and three other folks are locked out of the CrowdStrike support so emailing our TAM is all we can do. Very frustrating as we are a large environment with multiple CIDs and parent-view reporting is broken as well.
I'd be understanding if the 291* file was the extent of the issues, but all the subsequent burden we are dealing with is ridiculous.
It's funny cause all I keep getting back in "known issue" when I raise these issues.
alphex@reddit
I’m not on sys admin side of things. But a client of mine today said their IT group was telling everyone to reboot 15 times. That explains it.
Banoo13@reddit
So much power to a Ukrainian
iknowyerbad@reddit
So this whole thing was a ploy to get people to use cloud remediation? 🤣🤣
MadDawgThaKing@reddit
Our systems stopped boot looping about an hour and a half after this incident. Does that mean we had the auto update enabled and it self remediated?
hankhillnsfw@reddit
No reason why this took 3 days to figure out.
jedipiper@reddit
Why they wouldn't just remediate everyone immediately is ridiculous. There's no reason for them to not pull it back immediately and then once the dust has settled, look into sending it back out, fixed this time.
barkingcat@reddit
cause they want to charge money for it, and also for you to notice how nice they're being (provided you sign up for the renewal)
amcannally@reddit
Somebody please tell this man about iDRAC…
littlejob@reddit
Can confirm.. had over 50k endpoints start to phone back home in a few more hours.
CS also updated a few dashboard in their SIEM component of the tool. You can now identify easily the assets that received the flawed channel file and have not phoned back home since. Given there was a smaller subset of users traveling.. was rather accurate..
StaticR0ute@reddit
This is like 3 days late and shouldn’t have been opt-in only.
Even if tou turn this on though, what if you have NAC/ISE on your switches at the access layer? Wouldn’t that likely prevent Crowdstrike from communicating before the blue screen anyways?
Cmonlightmyire@reddit
What the fuck race condition is going on in their product.
whasf@reddit
We turned that on for our tenant and weren't having much luck with it. It's better to just use the Microsoft tool
lzwzli@reddit
So make a virus to fix an antivirus... Full circle
HJForsythe@reddit
Yeah this wont work in a lot of cases as the system will crash before the update happens but at least they are trying.
SavagePeaches@reddit
Sucks there's nothing public stating this as of right now. I'm frontline at my workplace (so as low on the totem pole as can be) and I'd love to tell them about this but I know I'd be asked for a source.
Precision20@reddit
That would be great, if I hadn't already gone through all our endpoints. If it happens in the future great, or for bitlockered computers(if this works on them). But I feel like they took too long to get to this point.
CharcoalGreyWolf@reddit
Something something horse has escaped lock the barn door something something
boftr@reddit
Not knowing anything about CS. The work the driver is doing to load the bad sys file must be quite late in the driver’s startup to allow a user mode process time to reach out and download an ‘update’, be it a file or a cloud lookup to cache some data about the bad update sys file. I can imagine that once the ‘update data’ is fetched, it can configure the same driver to block the data file in question most likely for the next boot.
coolvibes-007@reddit
Does not work in a virtual environment such as VMware.
NotAFakeName59@reddit
Yeah I've read about that "reboot it 15 times" workaround. Problem is it still requires someone to visit each machine, and that's not even factoring in wifi.
PetieG26@reddit
What? This whole thing could’ve been avoided and not so full blown ? This is crazy talk - why wasn’t this made public Friday?
cvsysadmin@reddit
Note that when you're looking at your dashboard you'll probably find computers that were already updated and are working. The bad sys file was left on the computer and replaced with a good one. The quarantine will snag the old one even if a new one is present and the computer has been working fine.
Bajiri@reddit
It should be noted that this is incredibly unreliable on wifi. Pretty good chances of it working if wired, but very spotty the more latency you introduce. We also met with crwd, and they basically said it wasn't really possible for most wireless connections.
ecar13@reddit
Microsoft just released a bootable usb that will boot your computer into preboot environment and automatically delete the offending sys file. Simple but I like that they did this. Probably tired or getting blamed for this nightmare. But for a large number of workstations / servers where even this process is cumbersome, the automated solution from CrowdStrike seems promising.
CuriouslyContrasted@reddit
The issue now is the number of servers and workstations that are trashed due to constant BSOD's. Those that just require the file removed are long remediated.
djsyndr0me@reddit
If you're almost 96 hours in and still have broken systems they weren't that critical to begin with.
The_Gadgeteer@reddit
We had remote users that went through 20+ reboots to restore their laptops. It was just a few that were impacted during the bad signature window, fortunately.
purefire@reddit
Did this earlier today, had a positive impact but not a silver bullet. Wired network is much more likely than wireless
flatvaaskaas@reddit
Any one got more information about how this works? Last weekend was known that ethernet connection and rebooting multiple times could help,
But what exactly does this Cloud remediation do?
cvsysadmin@reddit
It's leveraging CrowdStrike's quarantine system to quarantine its own bad sys file and can do so just a hair faster than it can push the update to fix it permanently. Just fast enough to beat the bsod/reboot which is all that's needed. Once the file is quarantined and the computer is running properly it can download the actual update.
kuahara@reddit (OP)
Did you read beyond the title?
Wuss912@reddit
so they won't push the update globally without making you jump though hoops?
Michagogo@reddit
The update was pushed within a couple hours, which is why the race was already a thing. This isn’t an update, IIUC this is using a different mechanism entirely (which acts with higher priority than the updates, so a better chance to win the race) for a purpose other than what it’s originally intended for, which might be why they don’t feel comfortable pushing it out unilaterally. Not that I think that makes sense, but that’s the most likely reasoning I can think of.
donkeydickerson@reddit
Anyone else get goosebumps 🤗
Science_Fair@reddit
Worked for about 15 percent of our environment today. We had tried something similar with machine startup scripts and that also worked about 15 percent of the time.
Crowdstrike would do it for you, but you could also do it to your own tenant. Just tell CS to quarantine the offending .SYS file. Might need to turn off tamper protection temporarily.
PessimisticProphet@reddit
WTF? They should have pushed that out immediately the second they thought of it.
SimonGn@reddit
Well, annouced it. They should test it so it was not made even worse
JayFromIT@reddit
Joke (not sure/making this up): they need you to "opt in" because you wave your rights to sue them.
TransporterError@reddit
Prolly, no joke…
Catball-Fun@reddit
How do they make sure they are one of the first kernel extensions to be loaded?
Dracozirion@reddit
https://learn.microsoft.com/en-us/windows-hardware/drivers/install/early-launch-antimalware
Outrageous_Device557@reddit
So this error was after windows loaded the network stack and grabbed a IP
l0st1nP4r4d1ce@reddit
I'd be curious about the language included in the opt in. Does it limit the liability for CS?
JetreL@reddit
https://petri.com/microsoft-crowdstrike-recovery-tool-windows/
PlannedObsolescence_@reddit
Microsoft's tool still uses the approach that requires manual intervention (USB or PXE booting a device) that is relatively complex. Sure easier than walking someone through deleting a file in system32, absolutely - but all those approaches get more awkward when the endpoint uses Bitlocker, so now 64 char recovery codes needs to be retrieved and shared etc.
There is a clear win here if it's possible to take the 'reboot many times, maybe even 15' approach (which is not guaranteed to work of course), and turn it into 'reboot a few times with ethernet and there's a good chance you'll be sorted'.
JetreL@reddit
Yeah luckily I don’t have this issue just trying to share some other ideas.
BoBTheCornCob_@reddit
We implemented it, slow but supposedly gaining numbers.
itwaht@reddit
This is the most promising resolution I've seen so far... Time to buy Crowdstrike stock?
CFOMaterial@reddit
I have a question. I am not a system admin, but somehow, my personal computer, an HP All in one, got this bug. Its not even connected to a work network, just a regular personal computer. I am guessing HP has some spying app stuff built into their computers "for the customer" and that is why this computer was impacted by Crowdstrike issues. I cannot get into safe mode even after entering the bitlocker code, and I cannot seem to modify the bios hard drive type which seemed to be the last solution I saw on this subreddit if you can't get into safe mode. Any clue what else I can do?
mgdmw@reddit
Does your personal computer, that is not connected to a work network, actually have CrowdStrike installed? It is an enterprise anti-virus (etc). Unless you actually have CrowdStrike on your computer, your problem is unrelated and is something else and the fact it occurred at the same time as CrowdStrike’s issue is entirely coincidental.
CFOMaterial@reddit
It is not connected to a work network, and we never installed Crowdstrike, but it literally happened Thursday night around 9 or 10PM EST, downloaded update and installed, and instant BSOD on reboot. I feel like the coincidence is too strong. What are the odds of that happening to a 2 year old barely used computer in an update, at the same time?
mgdmw@reddit
How do you propose a problem affecting CrowdStrike affected your computer without it actually having CrowdStrike installed?
CFOMaterial@reddit
That is the question. My only guess is HP uses crowdstrike somehow to help with their remote connect features to troubleshoot customers PCs. Otherwise, its the biggest coincidence in the world.
mgdmw@reddit
Biggest coincidence in the world it is. HP is not paying CrowdStrike to be on customer computers. It’s a monthly fee per PC. It’s not a free product.
CFOMaterial@reddit
Ok, crazy situation then. Thanks
nullbyte420@reddit
Love your username lmao
CFOMaterial@reddit
Thanks
kuahara@reddit (OP)
If you have access to another computer, you can download the tool from Microsoft to automate the removal. You'll need a thumb drive and the ability to enter the boot menu on your affected computer.
CFOMaterial@reddit
I am able to get to cmd screen, and couldn't find the file. Which makes sense in that is a personal computer that never had crowdstrike installed on it, but then why would it happen at the same exact time this happens?
GMginger@reddit
Simply a coincidence, no more than that. You may get more help if you post to /r/techsupport and mention the issue pointing out that you don't have CS.
CFOMaterial@reddit
Thanks for the advice.
VegaNovus@reddit
Coincidence. Completely unrelated.
stkyrice@reddit
Wasn't this on their blog a couple days ago?
StPaddy81@reddit
We were emailed as a customer mid-day Sunday that this was an available option
reegz@reddit
Yeah every researcher has been looking at it since Friday.
StPaddy81@reddit
We opted in and it seems to be doing its thing. I did notice that some hosts that were not blue screening are showing up as having that particular file quarantined, I’m assuming they do it by sha256 hash and not file name, so I’m wondering why some of these machines were not blue screening if they had the affected channel update file on them.
I reached back out to support for more info.
KaitRaven@reddit
The old version of the 291 channel file is not automatically removed when devices get the updated via the normal process, it's just superseded and remains in the folder.
VegaNovus@reddit
It's not done by hash; the hash varies customer-by-customer.
StPaddy81@reddit
Every single file being quarantined so far (75+ instances) has the same sha256 hash. Not sure how else they would do it...
And this is opt-in so when they created the rules for my CID(s), then I'd assume that they would create the quarantine rules based on the hash for me as a customer.
Doublestack00@reddit
What if bit locker is enabled?
RideZeLitenin@reddit
Would be nice if it bypassed bitlocker, but I have a feeling code may be needed to boot into C: for a bit, then hopefully fix. Removing the need for CMD removal in Win Recovery.. hopefully
KaitRaven@reddit
This remediation happens during the normal boot process. The drive is already unlocked at this point.
peoplepersonmanguy@reddit
If windows is loading bitlocker is already passed.
Dracozirion@reddit
By the time the boot-start driver loads, the disk is already unlocked. Should make no difference in this case.
TrueStoriesIpromise@reddit
You'd have to type in the password, per normal, but then hopefully the update hits before the BSOD hits.
hiroshima_fish@reddit
Commenting for a response to this as well.
VegaNovus@reddit
You'd just need to deal with this the same way you would if a remote user locked their laptop and it got stuck at the bitlocker screen.
All this method needs is a normal boot (not recovery, not safe) and then to win a race condition.
e0m1@reddit
I personally tried this and like 10 or so boot attempts, too many variables. I can't just keep rebooting and hoping. I hate you crowdstrike, you literally ruined my weekend. I was a huge advocate.
cowprince@reddit
The opt-in method with the reboot is different though.
Before it was 100% luck. Now it's just 50% luck.
crankyinfosec@reddit
The fact this wasn't automatically opt'd in on Friday for all customers impacted is insane. I appreciate the solution 3 days late but this is the final nail in the coffin that is making us move away from crowdstrike. A chuck of our laptop fleet doesn't have an onboard nic, guess it doesn't work nearly as well in that situation, a ton of them are still fucked even after several reboots.
node808@reddit
Took us about 4 hours last Friday morning to get everything running again, but we're small with only 750 workstations and about 65 vm's.
Top_Outlandishness54@reddit
I want them to release how this happened. I would bet money it was an outsourced contractor that caused it all.
Cley_Faye@reddit
Well, that's nice. That also cement them even more as something that would be a prime target for an actual cyberattack if they're able to do that so early.
Asymmetric_Warfare@reddit
Just did this in our tenant to remediate several hundred devices both physical and VM’s with success.
bjc1960@reddit
Thank you to the OP. Odd this has not been comminated more widely by all the experts.
UncleGrimm@reddit
They’re keeping quiet in the press for the most part but they’ve been reaching out directly pretty often. Hearing whispers that CS may foot the bill to send out some additional help for the remaining machines that this doesn’t remediate
illicITparameters@reddit
Good looks. Passed this on to some colleagues at other orgs.
Bro-Science@reddit
Microsoft also released there own automated tool to fix it. https://techcommunity.microsoft.com/t5/intune-customer-success/new-recovery-tool-to-help-with-crowdstrike-issue-impacting/ba-p/4196959
kuahara@reddit (OP)
I released a similar tool a day ahead of MS. This is still good for when you need to remediate manually, but the cloud solution is going to be far more efficient.
Burnerd2023@reddit
Nice