CrowdStrike ex-employees: 'Quality control was not part of our process'

561 points by everybodyknows 5 days ago | 305 comments

> “Speed was the most important thing,” said Jeff Gardner, a senior user experience designer at CrowdStrike who said he was laid off in January 2023 after two years at the company. “Quality control was not really part of our process or our conversation.”

This type of article - built upon disgruntled former employees - is worth about as much as the apology GrubHub gift card.

Look, I think just as poorly about CrowdStrike as anyone else out there... but you can find someone to say anything, especially when they have an axe to grind and a chance at some spotlight. Not to mention this guy was a designer and wouldn't be involved in QC anyway.

> Of the 24 former employees who spoke to Semafor, 10 said they were laid off or fired and 14 said they left on their own. One was at the company as recently as this summer. Three former employees disagreed with the accounts of the others. Joey Victorino, who spent a year at the company before leaving in 2023, said CrowdStrike was “meticulous about everything it was doing.”

So basically we have nothing.

nyc_data_geek1 4 days ago | root | parent | next |

>>So basically we have nothing.

Except the biggest IT outage ever. And a postmortem showing their validation checks were insufficient. And a rollout process that did not stage at all, just rawdogged straight to global prod. And no lab where the new code was actually installed and run prior to global rawdogging.

I'd say there's smoke, and numerous accounts of fire, which this can be taken in the context of.

sundvor 4 days ago | root | parent | next |

"Everyone" piles on Tesla all the time; a worthwhile comparison would be how Tesla roll out vehicle updates.

Sometimes people are up in arms "where's my next version" (eg when adaptive headlights was introduced), yet Tesla prioritise a safe, slow roll out. Sometimes the updates fail (and get resolved individually), but never on a global scale. (None experienced myself, as a TM3 owner on the "advanced" update preference).

I understand the premise of Crowdstrike's model is to have up to date protection everywhere but clearly they didn't think this through enough times, if at all.

kccqzy 4 days ago | root | parent |

You can also say the same thing about Google. Just go look at the release notes on the App Store for the Google Home app. There was a period of more than six months where every single release said "over the next few weeks we're rolling out the totally redesigned Google Home app: new easier to navigate 5-tab layout."

When I read the same release notes so often I begin to question whether this redesign is really taking more than six months to roll out. And then I read the Sonos app disaster and I thought that was the other extreme.

cesarb 4 days ago | root | parent |

> Just go look at the release notes on the App Store for the Google Home app. [...] When I read the same release notes so often I begin to question whether this redesign is really taking more than six months to roll out.

Google is terrible at release notes. Since several years ago, the release notes for the "Google" app on the Android app store always shows the exact same four unchanging entries, loosely translating from Portuguese: "enhanced search page appearance", "new doodles designed for app experience", "offline voice actions (play music, enable Wi-Fi, enable flashlight) - available only in the USA", "web pages opened directly within the app". I heavily doubt it's taking these many years to roll out these changes; they probably simply don't care anymore, and never update these app store release notes.

The sentence you quoted clearly meant, from the context, "clearly we have nothing [to learn from the opinions of these former employees]". Nothing in your comment is really anything to do with that.

tomrod 4 days ago | root | parent |

Triangulation versus new signal.

There definitely was a huge outage, but based on the given information we still can't know for sure how much they invested in testing and quality control.

There's always a chance of failure even for the most meticulous companies.

Now I'm not defending or excusing the company, but a singular event like this can happen to anyone and nothing is 100%.

If thorough investigation revealed poor quality control investment compared to what would be appropriate for a company like this, then we can say for sure.

daedrdev 4 days ago | root | parent | next |

Two things are clear though

Nobody ran this update

The update was pushed globally to all computers

With that alone we know they have failed the simplest of quality control methods for a piece of software as widespread as theirs. This is even excluding that there should have been some kind of error handling to allow the computer to boot if they did push bad code.

hn_throwaway_99 4 days ago | root | parent | next |

While I agree with this, from a software engineering perspective I think it's more useful to look at the lessons learned. I think it's too easy to just throw "Crowdstrike is a bunch of idiots" against the wall, and I don't think that's true.

It's clear to me that CrowdStrike saw this as a data update vs. a code update, and that they had much more stringent QA procedures for code updates that they did data updates. It's very easy for organizations to lull themselves into this false sense of security when they make these kinds of delineations (sometimes even subconsciously at first), and then over time they lose site of the fact that a bad data update can be just as catastrophic as a bad code update. I've seen shades of this issue elsewhere many times.

So all that said, I think your point is valid. I know Crowdstrike had the posture that they wanted to get vulnerability files deployed globally as fast as possible upon a new threat detection in order to protect their clients, but it wouldn't have been that hard to build in some simple checks in their build process (first deploy to a test bed, then deploy globally) even if they felt a slower staged rollout would have left too many of their clients unprotected for too long.

Hindsight is always 20/20, but I think the most important lesson is that this code vs data dichotomy can be dangerous if the implications are not fully understood.

GuB-42 4 days ago | root | parent | next |

It could have been ok to expedite data updates, should the code treat configuration data as untrusted input, as if it could be written by an attacker. It means fuzz testing and all that.

Obviously the system wasn't very robust, as a simple, within specs change could break it. A company like CrowdStrike, which routinely deals with memory exploits and claims to do "zero trust" should know better.

As often, there is a good chance it is an organization problem. The team in charge of the parsing expected that the team in charge of the data did their tests and made sure the files weren't broken, while on the other side, they expected the parser to be robust and at worst, a quick rollback could fix the problem. This may indeed be the sign of a broken company culture, which would give some credit to the ex-employees.

Izkata 4 days ago | root | parent |

> Obviously the system wasn't very robust, as a simple, within specs change could break it.

From my limited understanding, the file was corrupted in some way. Lots of NULL bytes, something like that.

acdha 4 days ago | root | parent | next |

That rumor floated around Twitter but the company quickly disavowed it. The problem was that they added an extra parameter to a common function but never tested it with a non-wildcard value, revealing a gap in their code coverage review:

https://www.crowdstrike.com/wp-content/uploads/2024/08/Chann...

GuB-42 4 days ago | root | parent | prev |

From the report, it seems the problem is that they added a feature that could use 21 arguments, but there was only enough space for 20. Until now, no configuration used all 21 (the last one was a wildcard regex, which apparently didn't count), but when they finally did, it caused a buffer overflow and crashed.

> It's clear to me that CrowdStrike saw this as a data update vs. a code update, and that they had much more stringent QA procedures for code updates that they did data updates.

It cannot have been a surprise to Crowdstrike that pushing bad data had the potential to bork the target computer. So if they had such an attitude that would indicate striking incompetence. So perhaps you are right.

> It's clear to me that CrowdStrike saw this as a data update vs. a code update

> Hindsight is always 20/20, but I think the most important lesson is that this code vs data dichotomy can be dangerous if the implications are not fully understood.

But it's not some new condition that the industry hasn't already been dealing with for many many decades (i.e. code vs config vs data vs any other type of change to system, etc.).

There are known strategies to reduce the risk.

If they weren't idiots they wouldn't be parsing data in the kernel level module

Crowdstrike is a bunch of idiots

llm_trw 4 days ago | root | parent | prev |

I'm sorry but there comes a point where you have to call a spade a spade.

When you have the trifecta of regex, *argv packing and uninitialized memory you're reaching levels of incompetence which require being actively malicious and not just stupid.

Also it's the _second_ time that they had done this in a few short months.

They had previous bricked linux hosts earlier with a similar type of update.

So we also know that they don't learn from their mistakes.

rblatz 4 days ago | root | parent |

The blame for the Linux situation isn’t as clear cut as you make it out to be. Red hat rolled out a breaking change to BPF which was likely a regression. That wasn’t caused directly by a crowdstrike update.

IcyWindows 4 days ago | root | parent | next |

At least one of the incidents involved Debian machines, so I don't understand how Red Hat's change would be related.

rblatz 4 days ago | root | parent |

Sorry, that’s correct it was Debian, but Debian did apply a RHEL specific patch to their kernel. That’s the relationship to red hat.

busterarm 4 days ago | root | parent | prev |

It's not about the blame, it's about how you respond to incidents and what mitigation steps you take. Even if they aren't directly responsible, they clearly didn't take proper mitigation steps when they encountered the problem.

roblabla 4 days ago | root | parent |

How do you mitigate the OS breaking an API below you in an update? Test the updates before they come out? Even if you could, you'd still need to deploy a fix before the OS update hits the customers, and anyone that didn't update would still be affected.

The linux case is just _very_ different from the windows case. The mitigation steps that could have been taken to avoid the linux problem would not have helped for the windows outage anyways, the problems are just too different. The linux update was about an OS update breaking their program, while the windows issue was about a configuration change they made triggering crashes in their driver.

busterarm 4 days ago | root | parent |

You're missing the forest for the trees.

It's: a) an update, b) pushed out globally without proper testing, c) that bricked the OS.

It's an obvious failure mode that if you have a proper incident response process would be revealed from that specific incident and flagged for needing mitigation.

I do this specific thing for a living. You don't just address the exact failure that happened but try to identify classes of risk in your platform.

> Even if you could, you'd still need to deploy a fix before the OS update hits the customers, and anyone that didn't update would still be affected.

And yet the problem would still only affect Crowdstrike's paying customers. No matter how much you blame upstream your paying customers are only ever going to blame their vendor because the vendor had discretion to test and not release the update. As their customers should.

roblabla 2 days ago | root | parent |

Sure, customers are free to blame their vendor. But please, we’re on HN, we aren’t customers, we don’t have beef in this game. So we can do better here, and properly allocate blame, instead of piling on the cs hate for internet clout.

And again, you cannot prevent your vendor breaking you. Sure, you can magic some convoluted process to catch it asap. But that won’t help the poor sods who got caught in-between.

ScottBurson 4 days ago | root | parent | prev |

> there should have been some kind of error handling

This is the point I would emphasize. A kernel module that parses configuration files must defend itself against a failed parse.

> If thorough investigation revealed poor quality control investment compared to what would be appropriate for a company like this, then we can say for sure.

We don't really need that thorough of an investigation. They had no staged deploys when servicing millions of machines. That alone is enough to say they're not running the company correctly.

dartos 4 days ago | root | parent | next |

Totally agree.

I’d consider staggering a rollout to be the absolute basics of due diligence.

Especially when you’re building a critical part of millions of customer machines.

wlonkly 3 days ago | root | parent | next |

I also fall on the side of "stagger the rollout" (or "give customers tools to stagger the rollout"), but at the same time I recognize that a lot of customers would not accept delays on the latest malware data.

Before the incident, if you asked a customer if they would like to get updates faster even if it means that there is a remote chance of a problem with them... I bet they'd still want to get updates faster.

dartos 3 days ago | root | parent |

There must be balance

mewpmewp2 4 days ago | root | parent | prev |

I would say that canary release is an absolute must 100%. Except I can think of cases where it might still not be enough. So, I just don't feel comfortable judging them out of the box. Does all the evidence seem to point against them? For sure. But I just don't feel comfortable giving that final verdict without knowing for sure.

Specifically because this is about fighting against malicious actors, where time can be of essence to deploy some sort of protection against a novel threat.

If there's deadlines that you can go over, and nothing bad happens, for sure. Always have canary releases, and perfect QA, monitoring everything thoroughly, but I'm just saying, there can be cases where damage that could be done if you don't act fast enough, is just so much worse.

And I don't know that it wasn't the case for them. I just don't know.

acdha 4 days ago | root | parent | next |

> Specifically because this is about fighting against malicious actors, where time can be of essence to deploy some sort of protection against a novel threat.

This is severely overstating the problem: an extra few minutes is not going to be the difference between their customers being compromised. Most of the devices they run on are never compromised, because anyone remotely serious has defense in depth.

If it was true, or even close to true, that would make the criticism more rather than less strong. If time is of the essence, you invest in things like reviewing test coverage (their most glaring lapse), fuzz testing, and common reliability engineering techniques like having the system roll back to the last known good configuration after it’s failed to load. We think of progressive rollouts as common now but they got to get that mainstream in large part because the Google Chrome team realized rapid updates are important but then asked what they needed to do to make them safe. CrowdStrike’s report suggests that they wanted rapid but weren’t willing to invest in the implementation because that isn’t a customer-visible feature – until it very painfully became one.

dartos 4 days ago | root | parent | prev |

In this case, they pretty much caused a worst case scenario…

They literally half-assed their deployment process - one part enterprisey, one part "move fast and break things".

Guess which part took down much of the corporate world?

from Preliminary Post Incident Review at https://www.crowdstrike.com/falcon-content-update-remediatio... :

"CrowdStrike delivers security content configuration updates to our sensors in two ways: Sensor Content that is shipped with our sensor directly, and Rapid Response Content that is designed to respond to the changing threat landscape at operational speed.

...

The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing. This culminates in a staged sensor rollout process that starts with dogfooding internally at CrowdStrike, followed by early adopters. It is then made generally available to customers. Customers then have the option of selecting which parts of their fleet should install the latest sensor release (‘N’), or one version older (‘N-1’) or two versions older (‘N-2’) through Sensor Update Policies.

The event of Friday, July 19, 2024 was not triggered by Sensor Content, which is only delivered with the release of an updated Falcon sensor. Customers have complete control over the deployment of the sensor — which includes Sensor Content and Template Types.

...

Rapid Response Content is used to perform a variety of behavioral pattern-matching operations on the sensor using a highly optimized engine.

Newly released Template Types are stress tested across many aspects, such as resource utilization, system performance impact and event volume. For each Template Type, a specific Template Instance is used to stress test the Template Type by matching against any possible value of the associated data fields to identify adverse system interactions.

Template Instances are created and configured through the use of the Content Configuration System, which includes the Content Validator that performs validation checks on the content before it is published.

On July 19, 2024, two additional IPC Template Instances were deployed. Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data.

Based on the testing performed before the initial deployment of the Template Type (on March 05, 2024), trust in the checks performed in the Content Validator, and previous successful IPC Template Instance deployments, these instances were deployed into production."

hello_moto 4 days ago | root | parent |

> one part enterprisey, one part "move fast and break things".

When there's 0day, how enterprisey you would like to catch the 0day?

canucker2016 5 hours ago | root | parent | next |

Crowdstrike exploited their own 0-day. Their market cap went down by several billion dollars.

A patch should, at minimum:

1. Let the app run 2a. Block the offending behaviour 2b. Allow normal behaviour

Part 1. can be assumed if Parts 2a and 2b work correctly.

We know CrowdStrike didn't ensure 2a or 2b since the app caused the machine to reboot when the patch caused a fault in the app.

CrowdStrike's Root Cause Analysis, https://www.crowdstrike.com/wp-content/uploads/2024/08/Chann..., lists what they're going to do:

====

Mitigation: Validate the number of input fields in the Template Type at sensor compile time

Mitigation: Add runtime input array bounds checks to the Content Interpreter for Rapid Response Content in Channel File 291 - An additional check that the size of the input array matches the number of inputs expected by the Rapid Response Content was added at the same time. - We have completed fuzz testing of the Channel 291 Template Type and are expanding it to additional Rapid Response Content handlers in the sensor.

Mitigation: Correct the number of inputs provided by the IPC Template Type

Mitigation: Increase test coverage during Template Type development

Mitigation: Create additional checks in the Content Validator

Mitigation: Prevent the creation of problematic Channel 291 files

Mitigation: Update Content Configuration System test procedures

Mitigation: The Content Configuration System has been updated with additional deployment layers and acceptance checks

Mitigation: Provide customer control over the deployment of Rapid Response Content updates

====

tsimionescu 4 days ago | root | parent | prev |

Not sure, but definitely more enterprisey than "release a patch to the entire world at once before running it on a single machine in-house".

mewpmewp2 4 days ago | root | parent |

So it would be preferable to have your data encrypted, taken hostage unless you pay, and be down for days, instead of 6 hours of just down?

tsimionescu 4 days ago | root | parent | next |

Do you seriously believe that all CrowdStrike on Windows customers were at such imminent risk of ransomware that one-two hours to run this on one internal setup and catch the critical error they released would have been dangerous?

This is a ludicrous position, and has been proven obviously false by the proceedings: all systems that were crashed by this critical failure were not, in fact, attacked with ransomware once the CS agent was un-installed (at great pain).

hello_moto 2 days ago | root | parent |

I'd challenge you to be a CISO :)

You don't want to be in a situation where you're taken hostage and asked hundred mills ransomeware just because you're too slow to mitigate the situation.

xeromal 4 days ago | root | parent | prev |

That's a false dichotomy

Aeolun 4 days ago | root | parent | prev |

Nonsense. You don’t need any staged deploys if you simply make no mistakes.

hulitu 3 days ago | root | parent | prev |

[flagged]

dang a day ago | root | parent |

Could you please stop posting unsubstantive comments and/or flamebait? Posts like this one and https://news.ycombinator.com/item?id=41542151 are definitely not what we're trying for on HN.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

[deleted]

> And no lab where the new code was actually installed and run prior to global rawdogging.

I thought the new code was actually installed, the running part depends on the script input...?

nickcocker 4 days ago | root | parent | prev |

[dead]

I just don't think a company like Crowdstrike has a leg to stand on when leveling the "disgruntled" label in the face of their, let's face it, astoundingly epic fuck up. It's the disgruntled employees that I think would have the most clear picture of what was going on, regardless of them being in QA/QC or not because they, at that point, don't really care any more and will be more forthright with their thoughts. I'd certainly trust their info more than a company yes-man which is probably where some of that opposing messaging came from.

paulcole 4 days ago | root | parent |

Why would you trust a company no-man any more than a company yes-man? They both have agendas and biases. Is it just that you personally prefer one set of biases (anti-company) more than the other (pro-company)?

theideaofcoffee 4 days ago | root | parent | next |

Yes, I am very much biased toward being anti-company and I make no apologies for that. I've been in the corporate world long enough to know first-hand the sins that PR and corporate management commits on the company's behalf and the harm it does. I find information coming from the individual more reliable than having it filtered through corpo PR, legal, ass-covering nonsense, the latter group often wanting to preserve the status quo than getting out actual info.

paulcole 4 days ago | root | parent |

OK just checking. Nice that you at least acknowledge your bias.

Because there is still an off-hand chance that an employee who has been let go isn't speaking out of spite and merely stating the facts - depends on a combination of their honesty and the feeling they harbor about being let go. Everyone who is let go isn't bitter and/or a liar.

However, every company yes-man is paid to be a yes-man and will speak in favor of the company without exception - that literally is the job. Otherwise they will be fired and will join the ranks of the aforementioned people.

So logically it makes more sense for me to believe the former more than the latter. The two-sides are not equivalent (as you may have alluded) in term of trustworthiness.

nullvoxpopuli 4 days ago | root | parent |

Agreed. As a data point, i'm not disgruntled (i'm quoted in this article).

Mostly disappointed.

insane_dreamer 4 days ago | root | parent | prev |

Well, in this case, we know one side (pro-company) fucked up big time. The other side (anti-company) may or may not have fucked up.

That makes it easier to trust one side over another.

paulcole 3 days ago | root | parent |

You’ve kind of set yourself up in a no-lose situation here.

If the employees fucked up then you’ll say the company still fucked up because it wasn’t managing the employees well.

And then in that situation you’ll still believe the lying employees who say its the company’s fault while leaving out their culpability.

> So basically we have nothing.

No, what we have is a publication who is claiming that the people they talked to were credible and had points that were interesting and tended to match one another and/or other evidence.

You can make the claim that Semafor is bad at their jobs, or even that they're malicious. But that's a hard claim to make given that in the paragraph you've quoted they are giving you the contrary evidence that they found.

And this is a process many of us have done informally. When we talk to one ex-employee of a company, well maybe it was just that guy, or just where he was in the company. But when a bunch of people have the same complaint, it's worth taking it much more seriously.

This is like online reviews. If you selectively take positive or negative reviews and somehow censor the rest, the reviews are worthless. Yet, if you report on all the ones you find, it's still useful.

Yes, I'm more likely to leave reviews if I'm unsatisfied. Yes, people are more likely to leave CS if they were unhappy. Biased data, but still useful data.

If design isn’t involved in QC you’re not doing QC very well. If design isn’t plugged into development process enough to understand QC then you’re not doing design very well.

tw04 4 days ago | root | parent |

Why would a UX designer be involved in any way, shape, or form in kernel level code patches? They would literally never ship an update if they had that many hands in the pot for something completely unrelated. Should they also have their sales reps and marketing folks pre-brief before they make any code changes?

sonofhans 4 days ago | root | parent | next |

A UX designer might have told them it was a bad idea to deploy the patch widely without testing a smaller cohort, for instance. That’s an obvious measure that they skipped this time.

newshackr 4 days ago | root | parent |

But that doesn't have anything to do with what UX designers typically do

fzeroracer 3 days ago | root | parent | next |

I can't believe people on HN are posting this stuff over and over again. Either you are holistically disconnected from what proper software development should look like or outright creating the same environments that resulted in the crowdstrike issue.

Software security and quality is the responsibility of everyone on the team. A good UX designer should be thinking of ways a user can escape the typical flow or operate in unintended ways and express that to testers. And in decisions where management is forcing untested patches everyone should chime in.

Not true; UX designers typically are responsible for advocating for a robust, intuitive experience for users. The fact that kernel updates don’t have a user interface doesn’t make them exempt from asking the simple question: how will this affect users? And the subsequent question: is there a chance that deploying this eviscerates the user experience?

Granted, a company that isn’t focused on the user experience as much as it is on other things might not prioritise this as much in the first place.

hello_moto 4 days ago | root | parent | prev |

the person you're replying will not take any sane argument once they decided that UX must be involved in kernel technical decision...

sigseg1v 4 days ago | root | parent | next |

How would it not be related? Jamming untested code down the pipe with no way for users to configure when it's deployed and then rendering their machines inoperable is an extremely bad user experience and I would absolutely expect a UX expert to step in to try to avoid that.

hello_moto 2 days ago | root | parent |

Pick any large company that has a division working on Linux kernel (say Android).

I bet my ass UX is not anywhere close to the low-level OS team.

UX is definitely embedded in the App level team but not in low-level.

sonofhans 4 days ago | root | parent | prev |

Pfft, I never said that at all. I’m not talking about technical decisions. OP was talking about QC, which is verifying software for human use. If you don’t have user-centered people involved (UX or product or proserve) then you end up with user-hostile decisions like these people made.

zipy124 4 days ago | root | parent | prev |

I would agree if it was a UI designer, but a good UX designer designs for the users, which in this case including the system admins who will be updating kernel level code patches. Ensuring they have a good experience e.g no crashes, is their job. A recommendation would likely be for example small roll-outs to minimise the number of people having a bad user experience on a roll-out that goes wrong.

I'm going with principle of least astonishment, where productivity is more highly valued in most companies than quality control.

There are some very specific accusations backed up by non-denials from crowdstrike.

Ex-employees said bugs caused the log monitor to drop entries. Crowdstrike responded the project was never designed to alert in real time. But Crowdstrike's website currently advertises it as working in real time.

Ex-employees said people trained to monitor laptops were assigned to monitor AWS accounts with no extra training. Crowdstrike replied that "there were no experienced ‘cloud threat hunters’ to be had" in 2022 and that optional training was available to the employees.

I feel like crowdstrike is perfectly capable of mounting its own defense

> Quality control was not really part of our process or our conversation.

Is anyone really surprised or learned any new information? For us that have worked for tech companies, this is one of those repeating complaints that you hear across orgs that indicates a less than stellar engineering culture.

I've worked with numerous F500 orgs and I would say 3/5 orgs that I worked in, their code was so bad that it made me wonder how they haven't had a major incident yet.

In principle yes, I agree that former employees' sentiments have an obvious bias, but if they all trend in the same direction - people who worked in different times and functions and didn't know each other while on the job - that points to a likely underlying truth.

Well they certainly don't care about the speed of the endpoints their malware runs on. Shit has ruined my macos laptop's performance.

nullvoxpopuli 4 days ago | root | parent |

All EDR software does (at least on macos)

Source: me, a developer who also codes in free time and notices how bad fs perf is especially.

I've had the CrowdStrike sensor, and my current company is using cyberhaven.

So.. while 2 data points don't technically make a pattern, it does begin to raise suspicion.

> This type of article - built upon disgruntled former employees - is worth about as much as the apology GrubHub gift card

To you and me, maybe. To the insurers and airlines paying out over the problem, maybe not.

I do agree with having to expect bias there, but who else do you really expect to speak out?Any current employee would very quickly become an ex-employee if they speak out with any specifics.

I would expect any contractor that may have worked for CrowdStrike, or done something like a third-party audit, would be under an NDA covering their work.

Who's left to speak out with any meaningful details?

Here's some anecdotal evidence - a friend worked at CrowdStrike and was horrified at how incredibly disorganised the whole place was. They said it was completely unsurprising to them that the outage occurred. More surprising to them was that it hadn't happened more often given what a clusterfrock the place was.

> So basically we have nothing.

Except the fact that CrowdStrike fucked up the one thing they weren't supposed to fuck up.

So yeah, at this point I'm taking the ex-employees' word, because it confirms the results that we already know -- there is no way that update could have gone out had there been proper "safety first" protocols in place and CrowdStrike was "meticulous".

Disgruntled are the Crowdstrike customers that had to deal with the outage. These employees have a lot of reputation to lose for coming forward. Crowstrike is a disgrace of a company and many others like it are doing the same behaviors but they just haven't gotten caught yet. Software development has become a disgrace when the bottom line of squeezing margins to please investors took over.

darby_nine 3 days ago | root | parent |

> These employees have a lot of reputation to lose for coming forward.

Employees don't typically have much reputation to ruin. I am perfectly content putting this on their shoulders.

Aeolun 4 days ago | root | parent | prev |

Honestly, this article describes nearly all companies (from the perspective of the engineers) so I’m not sure I find it hard to believe this one is the same.

hitekker 4 days ago | prev | next |

I was surprised by how dismissive these comments are. Former staff members, engineers included, are claiming that their former company's unsafe development culture contributed to a colossal world-wide outage & other previous outages. These employee's allegations ought to be seen as credible, or at least as informative. Instead, many seem to be attacking the UX designer commenting on 'Quality control was not part of our process'.

My guess is that people are identifying with sentence said just before: "Speed [of shipping] is everything." Aka "Move fast and break things."

The culture described by this article must mirror many of our lived experiences. The pure pleasure of shipping code, putting out fires, making an impact (positive or negative)... and then leaving it to the next engineers & managers to sort out, ignoring the mess until it explodes. Even when it does, no one gets blamed for the outage and soon everyone goes back to building features that get them promoted, regardless of quality.

Through that ZIRP light, these process failures must look like a feature, not a bug. The emphasis on "quality" must also look like annoying roadblocks in the way of having fun on the customer's dime.

ClickedUp 4 days ago | root | parent | next |

This is not a game. I would normally agree but not when it comes to low-level kernel drivers. They're a cyber security company making it even worse.

Not very long ago we had this client who ordered a custom high security solution (using a kernel driver). I can't reveal too much but basically they had this offline computer running this critical database and they needed a way to account for every single system call to guarantee that any data could have not been changed without the security system alerting and logging the exact change. No backups etc were allowed to leave the computer ever. We were even required to check ntdll (this was on Windows) for hooks before installing the driver on-site & other safety precautions. Exceptions, freezes or a deadlock? No way. Any system call missed = disaster.

We took this seriously. Whenever we made a change to the driver code we had to re-test the driver on 7 different computers (in-office) running completely different hardware doing a set test procedure. Last test before release entailed an even more extensive test procedure.

This may sound harsh but CrowdStrike are total amateurs, always been. Besides, what have they contributed to the cyber security community? - Nothing! Their research are at a level of a junior cyber security researcher. They are willing to outright lie and jump to wild conclusions which is very frowned upon in the community. Also heard others comment on how CS really doesn't really fit the mold of a standard cyber security company.

Nah, CS should take a close look at true professional companies like Kaspersky and Checkpoint; industry leaders who've created proven top notch security solutions (software/services) but not least actually contributed their valuable research to the community for free, catching zero-days, reporting them before no one even had a chance of exploiting them.

They deserve some criticism.

musicale 4 days ago | root | parent |

I'm don't Kaspersky and Checkpoint either. But CS should exit the market.

There's folks out there who enjoy putting out proverbial fires? I find rework like that quite frustrating

hitekker 4 days ago | root | parent | next |

Absolutely. Some people are born firefighters. Nothing wrong with that.

I once worked with a senior engineer who loved running incidents. He felt it was real engineering. He loved debugging thorny problems on a strict timeline, getting every engineer in a room and ordering them about, while also communicating widely to the company. Then, there's the rush of the all-clear and the kudos from stakeholders.

Specific to his situation, I think he enjoyed the inflated ownership that the sudden urgency demanded. The system we owned was largely taken for granted by the org; a dead-end for a career. Calling incidents was a good way to get visibility at low-cost, i.e., no one would follow-up on our postmortem action items.

It eventually became a problem, though, when the system we owned was essentially put into maintenance mode, aka zero development velocity. Then I estimate (balancing for other variables) the rate the senior engineer called an incident for not-incidents went up by 3x...

oooyay 4 days ago | root | parent | next |

That's called hero culture and there's definitely something wrong with it.

wesselbindt 3 days ago | root | parent | prev |

I agree that enjoying firefighting is not inherently harmful. However, the situation you describe afterward irks me in some way I can't quite put my finger on. A lot of words (toxic, dishonest, marketing, counterproductive, bus factor) come to mind, but none of them quite fit.

hitekker 2 days ago | root | parent |

I have a word in mind, but I'll save it for a blogpost one day.

To be fair to the senior, it was a bad situation made worse by everyone's self-interest. Myself included.

Some people rise to the occasion during crises and find it rewarding. There's a lot of pop science around COMT (the "warrior gene" associated with stress resilience), which I take with a grain of salt. There does seem to be something there, though, and it overlaps with my personal experience that many great security operations people tend to have ADHD traits.

I've volunteered to fight a share of fires from people who check things in untested, change infrastructure randomly, etc.

What I've learned is that fixing things for these people (and even having entire teams fixing things for weeks) just leads to a continued lax attitude to testing, and leaving the fallout for others to deal with. To them, it all worked out in the end, and they get kudos for rapidly getting a solution in place.

I'm done fixing their work. I'd rather work on my own tasks than fix all the problems with theirs. I'm strongly considering moving on, as this has become an entrenched pattern.

MichaelZuo 4 days ago | root | parent | prev |

Well there are a handful of expert consultants who do, since they charge an eye watering price per hour for putting out fires.

righthand 4 days ago | root | parent | prev |

Former QA engineer here, and can confirm quality is seen as an annoying roadblock in the way of self-interested workers, disguised as in the way of having fun on the customers dime.

My favorite repeated reorg strategy over the years is “that we will train everyone in engineering to be hot swappable in their domains”. Talk about spinning wheels.

addled 4 days ago | prev | next |

Yesterday morning I learned that someone I was acquainted with had just passed away and the funeral is scheduled for next week.

They recently had a stroke at home just days after spending over a month in the hospital.

Then I remembered that they were originally supposed to be getting an important surgery, but it was delayed because of the CrowdStrike outage. It took weeks for the stars to align again and the surgery to happen.

It makes me wonder what the outcome would have been if they had gotten the surgery done that day, and not spent those extra weeks in the hospital with their condition and stressing about their future?

oehpr 4 days ago | root | parent | next |

I appreciate your post here and I'm glad you shared, because it's an example of a distributed harm. One of millions to shake out of this incident, that doesn't have a dollar figure, so it doesn't really "count".

To illustrate:

If I were to do something horrible like kick a 3 year olds knee out and cripple them for life, I would be rightly labeled a monster.

But If I were to say... advocate for education reform to push American Sign Language out of schools, so that deaf children grow up without a developmental language? We don't have words for that, and if we did, none of them would get near the cumulative scope and harm of that act.

We simply do not address distributed harms correctly. And a big part of it is that we don't, we can't, see all the tangible harms it causes.

namdnay 4 days ago | root | parent | prev |

Not to defend Crowdstrike in any way, but it’s a bit unfair to only look at the downside. What if his hospital hadn’t bought an antivirus, and got hit by ransomware?

addled 3 days ago | root | parent | next |

Sure, and even if the surgery happened on time, they still might have had a stroke once they got home and had the same outcome.

But as other posts on HN have discussed, anecdotes, especially your own, hit differently.

It makes me thankful the software I work on isn't involved in life and death situations... But then again, it causes me to better consider the things my work could be responsible for (banking). Rushed work that causes a loan application to fail or transaction to be held unnecessarily shouldn't kill someone outright, but there can be real consequences that affect real people just like Rita.

NeutralCrane 2 days ago | root | parent | prev |

Who needs ransomware when your antivirus platform is perfectly capable of bricking your computer systems?

0xbadcafebee 4 days ago | prev | next |

Critical software infrastructure should be regulated the way critical physical infrastructure is. We don't trust the people who make buildings and bridges to "do the right thing" - we mandate it with regulations and inspections. (When your software not working strands millions of people around the globe, it's critical) And this was just a regular old "accident"; imagine the future, when a war has threat actors trying to knock things out.

owl57 4 days ago | root | parent | next |

Did you notice that the piece of software in question was apparently installed mostly in companies where regulations and inspections already override sysadmins' common sense? Are you sure the answer is simply more of the same?

0xbadcafebee 4 days ago | root | parent | next |

I've worked in these enterprise organizations for a long time. They don't run on common sense, or even what one might consider "business sense". Their existing incentives create bizarre behavior.

For example, you might think "if a big security exploit happens, the stock price might tank". So if they value the stock price, they'll focus on security, right?. In reality what they do is focus on burying the evidence of security exploits. Because if nobody finds out, the stock price won't tank. Much easier than doing the work of actually securing things. And apparently it's often legal.

And when it's not a bizarre incentive, often people just ignore risks, or even low-level failures, until it's too late. Four-way intersections can pile up accidents for years until a school bus full of kids gets T-boned by a dump truck. We can't expect people to do the right thing even if they notice a problem. Something has to force the right thing.

The only thing I have ever seen force an executive to do the right thing is a law that says they will be held liable if they don't. That's still not a guarantee it will actually happen correctly, course. But they will put pressure on their underlings to at least try to make it happen.

On top of that, I would have standards that they are required to follow, the way building codes specify the standard tolerances, sizes, engineering diagrams, etc that need to be followed and inspected before someone is allowed into the building. This would enforce the quality control (and someone impartial to check it) that was lacking recently.

This will have similar results as building codes - increased bureaucracy, cost, complexity, time... but also, more safety. I think for critical things, we really do need it. Industrial controls, like those used for water, power (nuclear...), gas, etc, need it. Tanker and container ships, trains/subways, airlines, elevators, fire suppressants, military/defense, etc. The few, but very, very important, systems.

If somebody else has better ideas, believe me, I am happy to hear them....

chii 4 days ago | root | parent | next |

While good, those ideas will all increase costs.

Would you pay 10x (or more, even) for these systems? That means 10x the price of water, utilities, transport etc, which then accumulate up the chain to make other things which don't have criticality but do depend on the ones that do.

The thing is, what exists today exists because it's the path of least resistence.

Vegenoid 4 days ago | root | parent | next |

Consumer costs would not go up 10x to put more care into ensuring the continuous operation of critical IT infrastructure. Things like "an update to the software or configuration of critical systems must first be performed on a test system".

Cars without seat belts were the path of least resistance for a long time. I wonder how that changed.

You're right (not sure about the exact factor though) - and there's also additional costs when those systems fail. Someone, somewhere lost money when all those planes were grounded and services suspended.

At some point - maybe it already happened, I don't know - spending more on preventive measures and maintenance will be the path of least resistance.

No, it exists because of all must bow to the deity of increasing shareholder value. Remember that good product is not necessarily equal or even a subset of the easy to sell product. Only once the incentives are aligned towards building quality software that lasts will we see change.

insane_dreamer 4 days ago | root | parent | prev |

> Would you pay 10x (or more, even) for these systems?

if it's critical to your business, then yes; but you quickly find out whether or not it's actually critical to your business or whether it's something you can do without

abbadadda 4 days ago | root | parent | prev |

Probably there should be an independent body that oversees postmortems on tech issues, with the ability to suggest changes. This is what airlines face during crash investigations and often new rules are put in place (e.g., don’t let the shift manager self-certify his own work in the incident where the pilot’s window popped off). How this would look like with software companies, and what the bar is for being subject to this rigor I don’t know (I suspect not for a Candy Crush outage though).

In general, the biggest problem I see with late stage capitalism, and a lack of accountability in general, is that given the right incentives people will “fuck things up” faster than you can stop them. For example, say CrowdStrike was skirting QA - what’s my incentive as an individual employee versus the incentive of an executive at the company? If the exec can’t tell the difference between good QA and bad QA, but can visually see the accounting numbers go up when QA is underfunded, he’s going to optimize for stock price. And as an IC there’s not much you can do unless you’re willing to fight the good fight day in and day out. But when management repeatedly communicates they do not reward that behavior, and indeed may not care at all about software quality over a 5 year time horizon, what do you do? The key lies in finding ways to convince executives or short of that holding them to account like you say.

theideaofcoffee 3 days ago | root | parent |

I've commented on this before, but in this case I think it starts to fall onto the laps of the individual employees themselves by way of licensing, or at least some sort of certification system. Sure, you could skirt a test here or there, but then you'd only be shorting yourself when shit hits the fan. It'd be your license and essentially your livelihood on the line.

"Proper" engineering disciplines have similar systems like the Professional Engineer cert via the NSPE that requires designs be signed off. If you had the requirement that all software engineers (now with the certification actually bestowing them the proper title of 'engineer') sign off on their design, you could prevent the company from just finding someone else more unscrupulous to push that update or whatever through. If the entirety of the department or company is employing properly certificated people, they'd be stuck actually doing it the right way.

That's their incentive to do it correctly: sign your name to it, or lose your license, and just for drama's sake, don't collect $200, directly to jail. For the companies, employ properly licensed engineers, or risk unlimited downside liability when shit goes sideways, similar to what might happen if an engineering firm built a shoddy bridge.

Would a firm that peddles some sort of CRUD app need to go through all of this? If it handles toxic data like payments or health data or other PII, sure. Otherwise, probably not, just like you have small contracting outfits that build garden sheds or whatever being a bit different than those that maintain, say, cooling systems for nuclear plants. Perhaps a law might be written to include companies that work in certain industries or business lines to compel them to do this.

It’s not true that “common sense” is being overridden: most companies and sysadmins do need that baseline to avoid “forgetting” about things which aren’t trivial to implement (if you didn’t work in the field 10+ years ago, it was common to see systems getting patched annually or worse, people opening up SSH/Remote Desktop to the internet for convenience, shared/short passwords even for privileged accounts, vendors would require horribly insecure configuration because they didn’t want to hire anyone who knew how to do things better, etc.). There are drawbacks to compliance security but it has been useful for flushing all of that mess out.

Even if it wasn’t wrong, that’s still the wrong reaction. We’re in this situation because so many companies were negligent in the past and the status quo was obviously untenable. If there is a problem with a given standard the solution is to make a better system (e.g. like Apple did) rather than to say one of the most important industries in the world can’t be improved because that’d require a small fraction of its budget.

sitkack 4 days ago | root | parent | prev |

I sure noticed how much snark you packed into two sentences!

The regulations were the reason the companies were running Crowdstrike in the first place.

0xbadcafebee 4 days ago | root | parent |

I'm saying that a (different) regulation, standard, and inspection, should apply to the whole software bill of materials, as it relates to the critical-ness of the product. Like, if security is important, the security-critical components should be inspected/tested. That's how you build a building safely: the nails are built to a certain specification and the nail vendor signs off on that.

"We can't regulate the industry because then the US loses to China" or "regulation will kill the US competitive advantage!" responses I've had to suggesting the same and I just can't. But I agree with you 100%. If it's safety critical, it should be under even more scrutiny than other things, it shouldn't be left to self-regulating QA-like processes in profit seeking companies and has to have a bit more scrutiny before the big button gets pressed.

Edit: Disclaimer: The quotes aren't mine, just retorts I've received from others when I suggest the R-word.

janalsncm 4 days ago | root | parent | next |

> then the US loses to China

Yeah it makes no sense. Was the US not losing to China when we own-goaled the biggest cybersecurity incident in history?

worik 4 days ago | root | parent |

> then the US loses to China

Such a silly meme, too. Economics 101 China and USA would both benefit by halting the conflict and trading with each other

Zigurd 4 days ago | root | parent | prev |

Not to mention humans going extinct because regulators are to blame for there being no city on Mars. Because that's definitely the reason there's no city on Mars.

tedk-42 4 days ago | root | parent | prev |

Like everything, cheap, quick or good rule applies (pick 2).

Software is pretty much always made cheaply and quickly. Even NASA will have b software blunders and have rockets explode mid flight.

avree 4 days ago | prev | next |

"“Speed was the most important thing,” said Jeff Gardner, a senior user experience designer at CrowdStrike who said he was laid off in January 2023 after two years at the company. “Quality control was not really part of our process or our conversation.”

Their 'expert' on engineering process is a senior UX designer? Somehow, I doubt they were very close to the kernel patch deployment process.

acdha 4 days ago | root | parent |

They probably weren’t, but that still speaks to their general culture and is compatible with what we know about their kernel engineering culture (limited testing, no review, no use of common fail safe mechanisms).

esperent 4 days ago | root | parent | next |

> is compatible with what we know

In other words, it confirms our biases and we're willing to accept it at face value despite there being only a single anecdotal piece of evidence.

acdha 4 days ago | root | parent |

It sounds like you might want to read their technical report. That’s neither anecdotal nor a single point, and it showed a pretty large gap in engineering leadership with numerous areas well behind the state of the art.

That’s why I said it was compatible: both these former employees and their own report showed an emphasis on shipping rapidly but not the willingness to invest serious money in the safeguards needed to do so safely. If you want to construct another theory, feel free to do so.

hello_moto 4 days ago | root | parent | prev |

A company can have different business units with different culture/mentality.

I bet my ass anyone working in low-level code don't ship the way you do in Cloud.

acdha 4 days ago | root | parent |

> I bet my ass anyone working in low-level code don't ship the way you do in Cloud.

Their technical report says otherwise – and we know they didn’t adopt the common cloud practices of doing real testing before shipping or having a progressive deployment.

Cyclone_ 4 days ago | prev | next |

Not justifying what they did with qc, but qc is missing from quite a few places in software development that I've been apart of. People might get the impression from the article that every software project is well tested, whereas in my experience most are rushed out.

padjo 4 days ago | root | parent | next |

I’ve worked for several multi billion dollar software companies. None of them had a dedicated QA function by design. Everything is about moving fast. That culture is ok if you’re making entertainment software or low criticality business software. It’s a very bad idea for critical software. Unfortunately the “move fast” attitude has metastasised to places where it has no place .

Borborygymus 4 days ago | root | parent | prev |

Exactly.

Much of the discourse around this topic has described ideal testing and deployment practise. Maybe it's different in Silicon Valley or investment banks, but for the sorts of companies I work for (telco mostly) things are very far from that ideal.

My view of he industry is one of shocking technical ineptitude from all but a minority of very competent people who actually keep things running... Of management who prioritize short term cost reduction over quality at every opportunity, leading to appalling technical debt and demoralized, over-worked staff who rapidly stop giving a damn about quality, because speaking out about quality problems is penalized.

insane_dreamer 4 days ago | prev | next |

> CrowdStrike disputed much of Semafor’s reporting

I expect some ex-employees to be disgruntled and present things in a way that makes CroudStrike look bad. That happens with every company.

BUT, CrowdStrike has ZERO credibility at this point. I don't believe a word they say.

Zigurd 4 days ago | root | parent |

At some companies, like Boeing, the shorter list would be the gruntled employees.

insane_dreamer 4 days ago | root | parent |

> gruntled

have never heard that word used is a non-negative way

tsimionescu 4 days ago | root | parent | next |

Fun linguistics fact, but gruntled as the antonym of disgruntled is a back-formation. The word disgruntled is a bit strange, in that it uses "dis-" not as a reversal prefix (such as in dissatisfied or dissimilar), but as an intensifier. The original "gruntle" was related to grunt, grunting, it was similar to "grumble", denoting the sounds an annoyed crowd might make. But this old sense of gruntle, gruntling, gruntled has not been used since the 16th century. And in the past century, people have started back-forming a new "gruntle" by analyzing "dis-gruntled" as using the more common meaning of "dis-".

A similar use of dis- as an intensifier apparently happened in "dismayed" (here from an Old French verb, esmaier , which meant to trouble, to disturb), and in "disturbed" (from Latin a word, turba, meaning turmoil). I haven't heard any one say they are "mayed" or "turbed", but people would probably see the same as "gruntled" if you used them.

Off-Topic, but do I have a story for you

https://www.ling.upenn.edu/~beatrice/humor/how-i-met-my-wife...

dbattaglia 4 days ago | root | parent | prev |

I’ve only heard it from Michael Scott: “Everyone here is extremely gruntled”.

sersi 4 days ago | prev | next |

Crowdstrike was heavily pushed on us at a previous company both for compliance reason by some of our clients (BCG were the ones pushing us to use crowdstrike) and from our liability insurance company.

It was really an uphill battle to convince everyone not to use Crowdstrike. Eventually I managed to but after many meetings where I had to spend a significant amount of time convincing different shareholders. I'm sure a lot of people just fold and go with them.

mikeocool 4 days ago | root | parent | next |

Curious — did you go with a different EDR solution? Or were you able to convince people not to roll one out at all?

wesselbindt 4 days ago | root | parent | prev |

What made you unwilling to use CS at the time?

pclmulqdq 4 days ago | prev | next |

Everything that we know about CrowdStrike stinks of Knight Capital to me. A minor culture problem snowballed into complete dysfunction, eventually resulting in a company-ending bug.

ForOldHack 4 days ago | root | parent |

Knight Capitol:

"$10 million a minute.

That’s about how much the trading problem that set off turmoil on the stock market on Wednesday morning is already costing the trading firm.

The Knight Capital Group announced on Thursday that it lost $440 million when it sold all the stocks it accidentally bought Wednesday morning because a computer glitch. "

Glitch. Oh...

https://en.wikipedia.org/wiki/Therac-25

0cf8612b2e1e 4 days ago | root | parent |

I do not work in finance, but surely every trading company has had an algorithm go wild at some point. Just becomes a matter of how fast someone can pull the circuit breaker before the expensive failure becomes public.

bitcharmer 4 days ago | root | parent | next |

We have circuit breakers for that very purpose. Everyone on the street does. It's just that theirs seems to have failed for some reason.

pclmulqdq 4 days ago | root | parent |

Theirs didn't fail, and they did have one. The circuit breaker they had that would have worked was a big red button that killed all of their trading processes, which would have meant spending the rest of the day figuring out and unwinding their positions.

Ihey were unwilling to push that button in the short time they had. If you read the reports to the SEC or the articles about it, you will note that. The follow-ups recommended that all firms adopt a big red button that is less catastrophic.

bitcharmer 4 days ago | root | parent |

Gotcha, thanks for correcting me, I need to read up more about the incident.

> surely every trading company has had an algorithm go wild at some point.

You would think so.

Cynical me.

But no. When money is at stake much more care is taken than when lives are at stake.

pclmulqdq 4 days ago | root | parent | prev |

Shamelessly plugging my own blog post on this: https://specbranch.com/posts/knight-capital/

The TL;DR of Knight is that Knight had several things go wrong at the same time, and had no circuit breaker for the problem that did not stop trading for the whole firm for the day. Most trading firms have had things go badly, but the holes in the Swiss cheese aligned for Knight (and they were larger than many other firms). This all comes from a sort of culture of carelessness.

odyssey7 4 days ago | root | parent |

I always thought the Swiss cheese model was used to suggest that no one party could possibly be responsible for a bad thing that happened. Interesting to see the company’s culture blamed for the cheese itself.

pclmulqdq 4 days ago | root | parent |

Personally, I think there are too many things in modern American society that involve diffusion of responsibility, presumably so that people avoid negative consequences. If you're going to suggest that a system gives 1/10th of the responsibility to 10 different people, the one who made the system is the enabler of that and IMO should suffer the consequences.

odyssey7 4 days ago | root | parent |

The Swiss cheese model fits better as a rebuttal when the cheese comprises both the finger-pointer and the finger-pointee. Think: sure, our software had a bug that said up was down, but what about all of your own employees who used the software, had certifications, and should have known better than to accept its conclusions?

Your usage, in assigning blame rather than diffusing it, was novel to me.

MichaelRo 3 days ago | prev | next |

This "security" thing is getting ridiculous. It's become the Gestapo of information technology, they can do anything they want when they want to your computer, cannot resist it and there's absolutely no transparency on what they do to you and why.

I've recently changed jobs and the new employer, a large company, obviously has to have an IT compliance / security update policy because everyone else has it so if they stand out from the crowd and don't do it and somehow get hacked, it's 100x worse than constantly annoying employees and top of the line computers working like a 1970s terminal.

It's rarely that a week passes without the obligatory update + restart. And at least once a month they update THE FUCKING BIOS! What the fuck can be so broken in those laptops that the BIOS is a constant security hazard?! And why would you buy software from someone who week after week after week tell you all you had so far was a hazardous piece of shit that cannot possibly function without constant pampering?

Ahh and of course they botch it. Had to have the OS completely wiped out and reinstalled after the laptop started to behave more and more erratically, 100% caused by faulty updates on top of faulty patches trying to patch the faulty updates. Worked OK for a while afterwards then updates started piling up and so far I only lost use of the web camera (before it was Wifi then display adapter).

There's literally no words how much I hate "the system" and the constant security update take it up the ass we're forced to put up with.

chaps 4 days ago | prev | next |

Worked on a team that deployed crowdstrike agents to organize and... Yeah. One of the biggest problems we had was that the daemon would log a massive amount of stuff... But had no config for it to stop or reduce it.

bb88 4 days ago | prev | next |

Most interesting quote in the article:

    “It was hard to get people to do sufficient testing sometimes,” said Preston
    Sego, who worked at CrowdStrike from 2019 to 2023. His job was to review the
    tests completed by user experience developers that alerted engineers to bugs
    before proposed coding changes were released to customers. Sego said he was 
    fired in February 2023 as an “insider threat” after he criticized the
    company’s return to-work policy on an internal Slack channel.

Okay clearly that company has a culture issue. Imagine criticizing a policy and then getting labeled "insider threat".

nullvoxpopuli 4 days ago | root | parent | next |

I'd like to clarify: that my job was also to educate, modernize, and improve developer velocity through tooling and framework updates / changes (impacting every team in my department (UX / frontend engineering)).

Reviewing tests is part of PR review.

--- and before anyone asks, this is my statement on CrowdStrike calling everyone disgruntled:

"I'm not disgruntled.

But as a shareholder (and probably more primarily, someone who cares about coworkers), I am disappointed.

For the most part, I'm still mourning the loss of working with the UX/Platform team."

bb88 4 days ago | root | parent |

I mourn the fact that your ex co-workers are still working for a shitty company.

nullvoxpopuli 4 days ago | root | parent |

The market for jobs isn't great, so i don't blame them.

At the same time, i feel like big profit-chasing software companies are all like how CrowdStrike is.

Many may be in the same type of company, but situations have not arisen that reveal how leadership really feels about employees.

> Imagine criticizing a policy and then getting labeled "insider threat".

Especially because that’s incredibly dumb. A true insider threat would play nice while you find all your confidential data leaking.

bb88 4 days ago | root | parent |

I mean, that's just insanely true. I think this is maybe the most dystopian company I've ever heard of so far.

wesselbindt 4 days ago | root | parent | prev |

> return to work

I know you're just quoting the phrase, but what a gross and dishonest way of phrasing "return to office". Implies working remotely doesn't count as work. Smacks of PR. Yuck.

panic 4 days ago | prev | next |

Why would it matter? The absolute worst case scenario happened and their stock is still up 50% YoY, beating the S&P 500.

0cf8612b2e1e 4 days ago | root | parent | next |

I thought you were joking. The stock market is incredible.

Everyone must realize that crowdstrike has a captive audience with no alternatives that can meet corporate compliance.

intelVISA 4 days ago | root | parent |

Can't think of a bigger flex of how locked-in their market share is.

On the plus side this should spur some disruptors into gear, assuming VCs are willing to pivot from wasting money funding LLM wrappers.

hyperpape 4 days ago | root | parent | prev |

It’s down 30% since the incident, and flat since 3 years ago.

If it runs up a huge amount in the first half of the year and then the incident knocks off 30% of their market, that still means the incident was really bad.

hello_moto 4 days ago | root | parent |

Their stock has always been volatile but you can't ignore the fact that it hasn't been that bad after the incident.

nine_zeros 4 days ago | prev | next |

Typical of tech companies these days. Quality is considered immaterial - or worse - put on low level managers and engineers who don't have the time to clearly examine quality and good roll out practices.

C-Suite and investors don't seem to want to spend on quality. They should just price in that their stock investment could collapse any day.

dgfitz 4 days ago | root | parent |

[flagged]

neverrroot 4 days ago | root | parent |

This ^^

ricardobayes 4 days ago | prev | next |

I believe one of the biggest bad trends of the software industry as a whole is cutting down on QA/testing effort. A buggy product is almost always an unsuccessful one.

breadwinner 4 days ago | root | parent |

Blame Facebook and Google for that. They became successful without QA engineers, so the rest of the industry decided to follow suit in an effort to stay modern.

xyst 4 days ago | prev | next |

Switch off CrowdStrike junk. Those companies renewing contracts with them have idiots for leaders.

Many competing platforms that can be a drop in placement for ClownStrike.

hinkley 4 days ago | prev | next |

I have only just begun to consider this question: when does risk taking become thrill seeking?

At some point you go past questions of laziness or discipline and it becomes a neurosis. Like an addiction.

bmitc 4 days ago | prev | next |

Has anyone actually worked at a place where quality control was treated as important? I wouldn't consider this exactly surprising.

m3047 4 days ago | root | parent | next |

Yes. It was a manufacturing facility and since the products were photosensitive the entire line operated in total darkness. It was two months before they turned the lights on and I could see what I was programming for.

This was the first place I saw standups. [Edit: this was the 1990s] They were run by and for the "meat", the people running the line. "Level 2" only got to speak if we were blocked, or to briefly describe any new investigations we would be undertaking.

Weirdly (maybe?) they didn't drug test. I thought of all the places I've worked, they would. But they didn't. They were firmly committed to the "no SPOFs" doctrine and had a "tap out" policy: if anyone felt you were distracted, they could "tap you out" for the day. It was no fault. I was there for six months and three or four times I was tapped out and (after the first time, because they asked what I did with my time off the first time) told to "go climb a rock". I tapped somebody out once, for what later gossip suggested was a family issue.

bmitc 3 days ago | root | parent |

That sounds ... intense, to say the least.

m3047 3 days ago | root | parent |

It was a machine. At first it was kind of creepy to have the feeling that when you entered the building you were part of a machine. But after a couple of weeks it was addictive and I have never looked forward to going to work somewhere as much as I did while working there. Even climbing the rocks on my enforced days off gained a mental narrative that "I'm climbing this rock to be the best part of the machine I can be".

Sure most of the times I was tapped out I was distracted by personal thoughts. But one time I was just thinking about the problem. I protested "but I was thinking about the problem!" and they said "go think somewhere else!".

Yes, at a trading company, where important central systems had a multiweek testing process (unless the change was marked as urgent, in which case it was faster) with a dedicated team and a full replica environment which would replay historical functions 1:1 (or in some cases live), and every change needed to have an automated rollback process. Unsurprising since it directly affects the bottom line.

bmitc 3 days ago | root | parent |

Very interesting. Thanks for sharing.

> every change needed to have an automated rollback process

How did you accomplish that?

sudosysgen 3 days ago | root | parent |

We had a state management and deployment system through which all changes were effected that would automatically rollback changes if the smoke test failed, or if one of the ops staff found an issue.

I haven't worked there but I would presume that systems running nuclear reactors or ICBM launchers have a strong emphasis on QC.

6h6n56 4 days ago | root | parent | prev |

Nope. Did everyone forget the tech motto "move fast and break things"? Where is the room for quality control in that philosophy?

Corps won't even put resource into anti-fraud efforts if they believe the millions being stolen from their bottom line isn't worth the effort. I have seen this attitude working in FAANGS.

None of this will change until tech workers stop being sadists and actually unionize.

jrm4 4 days ago | prev | next |

Does anyone have a logical reason why this company should not be sued into oblivion?

superposeur 4 days ago | root | parent |

Yes, because in point of fact this company is the best at what it does — preventing security breaches. The outage — disruptive as it was — was not a breach. This elemental fact is lost amidst all the knee jerk HN hate, but goes a long way toward explaining why the stock only took a modest hit.

hun3 4 days ago | root | parent |

That's a somewhat narrow definition of "security."

The 3rd component of the CIA triad is often overlooked, yet the availability is what makes the protected asset—and, transitively, the protection itself—useful at the first place.

The disruption is effectively a Denial of Service.

ramesh31 4 days ago | prev | next |

If their (or your) shop is anything like mine, its' been a constant whittling of ancillary support roles (SDET, QA, SRE) and a shoving of all of the above into the sole responsibility of devs over the last few years. None of this is surprising at all.

noisy_boy 4 days ago | prev | next |

Would be interesting to know from their employees if there have been any tangible changes in the blind pursuit of velocity, better QA etc in the aftermath of this fiasco.

Timber-6539 4 days ago | prev | next |

Doesn't matter now. CRWD didn't go to zero. Meaning they get the chance to do this again.

mrjin 3 days ago | prev | next |

Wasn't that obvious? If any tests were performed at all, how anyone can manage to caused an outage at such scale?

mattfrommars 4 days ago | prev | next |

Side effect of the old adage, "move fast, fail fast"?

4 days ago | root | parent |

[deleted]

nailer 4 days ago | prev | next |

It’s a UX designer. I don’t particularly like crowdstrike, but this person will know very little about their kernel Drivers.

paulcole 4 days ago | prev | next |

Well if they say that QA was part of the process then they’ll look like idiots because they sucked at the process.

Don’t find this particularly interesting news.

manvillej 4 days ago | prev | next |

anyone feel like this and Boeing sound remarkably similar?

Its almost like there is a lesson for executives here. hmmmm

bitcharmer 4 days ago | root | parent |

The only lesson for these people is loss of bonuses. This will keep happening for as long as golden parachutes are a thing.

wesselbindt 4 days ago | root | parent |

How can we get rid of golden parachutes?

goralph 4 days ago | prev | next |

What are some alternatives to CrowdStrike?

taspeotis 4 days ago | root | parent | next |

Personal: Nothing - Windows Defender is built into Windows.

Business: Nothing - Windows Defender Advanced Threat Protection is built into the higher Microsoft 365 license tiers.

It amazes me people chose to pay money to have all their PCs bluescreen.

qaq 4 days ago | root | parent | next |

large orgs want something that will run across all of their fleet so linux servers, Macs etc.

taspeotis 4 days ago | root | parent |

Linux: https://learn.microsoft.com/en-us/defender-endpoint/microsof...

macOS: https://learn.microsoft.com/en-us/defender-endpoint/microsof...

It does iOS and Android too.

Again, if you're an organisation big enough to care about single-pane-of-glass-monitoring you probably already have access to this via the Microsoft 365 license tier you're on.

if you had used 'some' before 'people' i could agree but some industries have to use a siem or they can be fined, so, i mean if there's a list of siems that are definitely not going to ever crash by messing around in the kernel lets get a list going

willy_k 2 days ago | root | parent | next |

Luckily the concern isn’t simply whether they could make a mistake and cause a crash by easing around in the kernel, it’s whether they’re likely to, and I’d argue that CrowdStrike is particularly likely to do so given their testing and rollout processes, and the culture that encompasses those failures

taspeotis 4 days ago | root | parent | prev |

Microsoft Sentinel seems like a pretty unlikely candidate for SIEM to crash every machine it’s receiving data from.

mdatp is also a virus. So slow…

taspeotis 4 days ago | root | parent |

It can record some telemetry to help you understand why it's slow: https://learn.microsoft.com/en-us/defender-endpoint/troubles...

neverrroot 4 days ago | root | parent | prev |

This is a good example of very limited thinking.

> What are some alternatives to CrowdStrike?

In house competence

duckmysick 4 days ago | root | parent | next |

Insurers often require to have Endpoint Detection and Response for all the devices, from a third-party. In-house often won't cut it, even if it makes more practical sense.

rnts08 4 days ago | root | parent | prev |

But then you can't blame anyone else when shit hits the fan! Isn't that what you're really paying for with EDR? No one is safe from a targeted attack, regardless of software.

Everything that describes itself as "endpoint security".

Carbon Black was, though now they're owned by Broadcom and folded into Symantec

iamhamm 4 days ago | root | parent | prev |

SentinelOne

SlightlyLeftPad 3 days ago | prev | next |

Just another example of technical leadership being completely irresponsible and another example of tech companies prioritizing the wrong things. As a security company, this completely blows their credibility. i’m not convinced they learned anything from this and don’t expect this effect to change anything. This is a culture issue, not a technical one. One RCA isn’t going to change this.

Reliability is a critical facet of security from a business continuity standpoint. Any business still using crowdstrike is out of their mind.

tamimio 4 days ago | prev | next |

I think the whole world knew that already.

seanw444 4 days ago | prev | next |

And everybody gasped in surprise.

nittanymount 4 days ago | prev | next |

does it have competitors ?

Sarkie 5 days ago | prev | next |

It was shown in the RCA that their QA processes were shit

st3fan 4 days ago | prev | next |

Found out that the CrowdStrike Mac agent (Falcon) sends all your secrets from environment variables to their cloud hosted SIEM. In plain text.

Anyone with access to your CS SIEM can search for GitHub, aws, etc creds. Anything your devs, ops and sec teams use on their Macs.

Only the Mac version does this. There is no way to disable this behaviour or a way to redact things.

Another really odd design decision. They probably have many many thousands of plain text secrets from their customers stored in their SIEM.

philshem 4 days ago | root | parent | next |

SIEM = Security information and event management

https://en.wikipedia.org/wiki/Security_information_and_event...

Having worked for a SIEM vendor, I can say that all security software is extremely invasive, and most security people can probably track every action you make on company-issued devices, and that includes HTTPS decryption.

firtoz 4 days ago | root | parent | next |

Reminds me of a guy I know openly bragging that he can watch all of his customers who installed his company's security cameras. I won't reveal his details but just imagine any cloud security camera company doing the same and you would probably be right.

I guess it's pretty much the same principle.

blablabla123 4 days ago | root | parent | prev |

Yeah the question is always if the cure is better than the disease. I'm quite ambivalent on this. On the one hand I tend to agree with the "Anti AV camp" that a sufficiently maintained machine can do well when following best practices. Of course that includes SIEM which can also be run on-premise and doesn't necessarily have to decrypt traffic if it just consumes properly formatted logs.

On the other hand there was e.g. WannaCry in 2017 where 200,000 systems across 150 countries running Windows XP and other unsupported Windows Server versions had crypto miners installed. It shows that companies world-wide had trouble properly maintaining the life cycle of their systems. I think it's too easy to only accuse security vendors of quality problems.

Can you provide some more info on this? How do you know? Is this documented somewhere?

I'm sure this is going to raise red-flags in my IT department.

skewer99 4 days ago | root | parent | next |

AKIDs... ugh. They'll be there if you use AWS + Mac.

Again, the plaintext is the problem.

These environment variables get loaded from the command line, scripts, etc. - CrowdStrike and all of the best EDRs also collect and send home all of that, but probably in an encrypted stream?

zxexz 4 days ago | root | parent |

I usually remote dev on an instance in a VPC because of crap like this. If you like terrible ideas (I don't use this except for debugging IAM stuff, occasionally), you can use the IMDS like you were an AWS instance by giving a local loopback device the link-local ipv4 address 169.254.169.254/32 and binding traffic on the instance's 169.254.169.254/32 port 80 to your lo's port 80, and a local AWS SDK will use the IAM instance profile of the instance you're connected to. I'll repeat, this is not a good idea.

st3fan 4 days ago | root | parent | prev |

Ask them to search for the usual env var names like GITHUB_TOKEN or AWS_ACCESS_KEY_ID.

The monitoring and collection isn't the problem, that's what modern EDR does - collect, analyze, compare, and do statistics on all of the things.

The plaintext part is not okay.

notepad0x90 4 days ago | root | parent |

Thank you, that's a sound perspective, but it is the responsibility of the security staff who deploy EDRs like Crowdstrike to scrub any data at ingestion time into their SIEM. but within CS's platform, it makes little sense to talk about scrubbing, since CS doesn't know what you want scrubbed unless it is standardized data forms (like SSNs,credit cards,etc..).

Another way to look at it is, the CS cloud environment is effectively part of your environment. the secrets can get scrubbed, but CS still has access to your devices, they can remotely access them and get those secrets at any time without your knowledge. that is the product. The security boundary of OP's mac is inclusive of the CS cloud.

st3fan 4 days ago | root | parent |

Unfortunately the software doesn’t allow for scrubbing or redacting to be configured. Those features simply do not exist.

notepad0x90 3 days ago | root | parent |

for their own cloud, yeah, you basically accept their cloud as an extension of your devices. but the back-end they use(d?), Splunk, does have scrubbing capability they can expose to customers, if actual customers requested it.

In reality, you can take steps to prevent PII from being logged by Crowdstrike, but credentials are too non-standard to meaningfully scrub. It would be an exercise in futility. If you trust them to have unrestricted access to the credential, the fact that they're inadvertently logging it because of the way your applications work should not be considered an increase in risk.

It is a common in the world of SIEM. Logs with secrets and PII data is often sent and stays in the SIEM for years until an incident occurs.

Does it also monitor the contents of your copy/paste buffer? It would scoop up a ton of privileged data if so.

Anyone with the right level of access to your Falcon instance can run commands on your endpoints (using RTR) and collect any data not already being collected.

that's what EDRs do. anyone with access to your SIEM or CS data should also be trusted with response access (i.e.: remotely access those machines).

If you want this redacted, it is a SIEM functionality not Crowdstrike's. Depends on the SIEM but even older generation SIEMs have a data scrubbing feature.

This isn't a Crowdstrike design decision as you've put it. any endpoint monitoring too, including the free and open source ones behave just as you described. You won't just see env vars from macs but things like domain admin creds and PKI root signing private keys. If you give someone access to an EDR, or they are incident responders with SIEM access, you've trusted them with full -- yet, auditable and monitored -- access to that deployment.

4 days ago | root | parent | next |

[deleted]

Sure, storage. Networking though? SIEMs receive and send data unencrypted? They should not. By sending the data in plain text you open up an attack surface to anyone sniffing the network.

notepad0x90 3 days ago | root | parent |

Crowdstrike like many EDRs uses mutually authenticated TLS to send the data over the network to their cloud.

pmlnr 4 days ago | root | parent | prev |

Don't downvote this, this is the sad truth.

This kind of information seems like it should have a CVE and a responsible disclosure process.

Kidding, mostly, but wow that's a hell of a vulnerability.

notepad0x90 4 days ago | root | parent |

It is not a vulnerability, you literally pay for this feature. I really don't want to defend Crowdstrike but HN keeps making it hard not to.

hiddencost 4 days ago | root | parent |

Storing secrets in unsecured environments in plaintext is literally a vulnerability.

One of the most famous examples can be seen in the NSA slide at the top of this article:

https://www.washingtonpost.com/world/national-security/nsa-i...

notepad0x90 4 days ago | root | parent |

the security tools' storage system is always considered a secured environment.

j4coh 4 days ago | root | parent |

Without even having to secure it?

throw_a_grenade 4 days ago | root | parent |

Yes, but also No.

So there's this thing called "Threat model" and it includes some assumptions about some moving parts of the infra, and it very often includes assertion that a particular environment (like IDS log, signing infra surrounding HSM etc.) is "secure" (they mean outside of the scope of that particular threat model). So it often gets papered over, and it takes some reflex to say "hey, how we will secure that other part". There needs to be some conciousnes about it, because it's not part of this model under discussuon, so not part of the agenda of this meeting...

And it gets lost.

That's how shit happens in compliance-oriented security.

Do you have a source?

Secrets in clear text in environment variables is never a good idea though.

dchftcs 4 days ago | root | parent |

There are secrets like passwords, but there are also secrets like "these are the parameters for running a server for our assembly line for X big corp".

Did somebody say GDPR?

raverbashing 4 days ago | root | parent | next |

Not applicable. It is not related to personal data

Companies believe GDPR doesn't apply to their human resources.

riedel 4 days ago | root | parent |

They have IT policies to make sure it largely does not apply. Even in our policy officially any personal use is forbidden. Funnily there is also agreement with our employee board, that any personal use will not be sanctioned. So guess what happens. This done to circumvent not only GPR but also TTDSG in germany (which is harsher on 'spying' as it applies to telecoms. For any 'officially' gathered personal information though typical very specific agreements with our employee board exist though (reporting of illness, etc). Wonder how such information which is also sensitive in a workplace is handled. Also I see those systems used in hospitals etc, if other peoples data is pumped through this systems GDPR definitively applies and auditors may find it (I only know such auditing in finance though). In the future NIS2 will also apply so exactly the people that use such systems will be put under additional scrutiny. Hope this triggers also some auditing of the systems used and not just the use of more of such systems.

unilynx 4 days ago | root | parent | prev |

What would you expect the GDPR to say? This is allowed as long as the GDPRs requirements are followed

[deleted]

apimade 4 days ago | root | parent | prev |

Is this really a criticism? Because this has been the case forever with all security and SIEM tools. It’s one of the reasons why the SIEM is the most locked down pieces of software in the business.

Realistically, secrets alone shouldn’t allow an attacker access - they should need access to infrastructure or a certificates in machines as well. But unfortunately that’s not the case for many SaaS vendors.

Aeolun 4 days ago | root | parent | next |

If my security software exfiltrates my secrets by design, I’m just going to give up on keeping anything secure now.

Ideally secrets never leave secure enclaves and humans at the organization can't even access them.

It's totally insane to send them to a remote service controlled by another organization.

cj 4 days ago | root | parent | next |

Essentially, it’s straddling two extremes:

1) employees are trusted with secrets, so we have to audit that employees are treating those secrets securely (via tracking, monitoring, etc)

2) we don’t allow employees to have access to secrets whatsoever, therefore we don’t need any auditing or monitoring

userbinator 4 days ago | root | parent | next |

employees are trusted with secrets, so we have to audit that employees are treating those secrets securely

IMHO needing to be monitored constantly is not being "trusted" by any sense of the word.

fragmede 4 days ago | root | parent |

I can trust you enough to let you borrow my car and not crash it, but still want to know where my car is with an Airtag.

Similarly employees can be trusted enough with access to prod, while the company wants to protect itself from someone getting phished or from running the wrong "curl | bash" command, so the company doesn't get pwned.

Exporting to a SIEM does not correlate to either of those extremes. It’s stupidity and makes auditing worse

cj 4 days ago | root | parent |

SIEM = Security Information & Event Management

Factually, it is necessary for auditing and absolutely correlates with the extreme of needing to monitor the “usage” of “secrets”.

In a highly auditable/“secure” environment, you can’t give secrets to employees with no tracking of when the secrets are used.

halayli 4 days ago | root | parent | next |

That's far from factual and you are making things up. You don't need to send the actual keys to a siem service to monitor the usage of those secrets. You can use a cryptographic hash and send the hash instead. And they definitely don't need to dump env values and send them all.

Sending env vars of all your employees to one place doesn't improve anything. In fact, one can argue the company is now more vulnerable.

It feels like a decision made by a clueless school principle, instead of a security expert.

A secure environment doesn't involve software exfiltrating secrets to a 3rd party. It shouldn't even centralize secrets in plaintext. The thing to collect and monitor is behavior: so-and-so logged into a dashboard using credentials user+passhash and spun up a server which connected to X Y and Z over ports whatever... And those monitored barriers should be integral to an architecture, such that every behavior in need of auditing is provably recorded.

If you lean in the direction of keylogging all your employees, that's not only lazy but ineffective on account of the unnecessary noise collected, and it's counterproductive in that it creates a juicy central target that you can hardly trust anyone with. Good auditing is minimally useful to an adversary, IMO.

> In a highly auditable/“secure” environment, you can’t give secrets to employees with no tracking of when the secrets are used.

This does not seem to require regularly exporting secrets form the employee's machines though. Which is the main complaint I am reading. You would log when the secret is used to access something, presumably remote to the users machine.

In a highly secure environment, don't use long lived secrets in the first place. You use 2FA and only give out short lived tokens. The IdP (ID Provider) refreshing the token for you provides the audit trail.

Repeat after me: Security is not a bolt on tool.

defrost 4 days ago | root | parent |

More like a triple lock steel core reinforced door laying on its side in an open field?

Good start, might need a little more work around the edges.

Aeolun 3 days ago | root | parent | prev |

> In a highly auditable/“secure” environment, you can’t give secrets to employees with no tracking of when the secrets are used.

Yeah. So you track them when they are used (which also gives you a nice timestamp). Not when they’re just sitting in the env.

ants_everywhere 4 days ago | root | parent | prev |

You give employees the ability to use the secrets, and that usage is tracked and audited.

It works the same way for biometrics like face unlock on mobile phones

> Ideally secrets never leave secure enclaves and humans at the organization can't even access them.

Right, but doesn't that mean there is no risk from sending employee laptop ENV variables, since they shouldn't have any secrets on their laptops?

Natsu 4 days ago | root | parent | prev |

I mean it's right there in the name. They're not really secrets any longer if you're sharing them in plaintext with another company.

Keeping secrets and other sensitive data out of your SIEM is a very important part of SIEM design. Depending on what you’re dealing with you might want to tokenize it, or redact it, but you absolutely don’t want to don’t want to just ingest them in plaintext.

If you’re a PCI company then ending up with a credit card number in your SIEM can be a massive disaster. Because you’re never allowed to store that in plaintext, and your SIEM data is supposed to be immutable. In theory that puts you out of compliance for a minimum of one year with no way to fix it, in reality your QSAs will spend some time debating what to do about it and then require you to figure out some way to delete it, which might be incredibly onerous. But I have no idea what they’d do if your SIEM somehow became full of credit card numbers, that probably is unfixable…

ronsor 4 days ago | root | parent |

> But I have no idea what they’d do if your SIEM somehow became full of credit card numbers, that probably is unfixable…

You'd get rid of it.

AmericanChopper 4 days ago | root | parent |

If that’s straightforward then congratulations, you’ve failed your assessment for not having immutable log retention.

They certainly wouldn’t let you keep it there, but if your SIEM was absolutely full of cardholder data, I imagine they’d require you to extract ALL of it, redact the cardholder data, and the import it to a new instance, nuking the old one. But for a QSA to sign off on that they’d be expecting to see a lot of evidence that removing the cardholder data was the only thing you changed.

> Realistically, secrets alone shouldn’t allow an attacker access - they should need access to infrastructure or a certificates in machines as well.

This isn't realistic, it's idealistic. In the real world secrets are enough to grant access, and even if they weren't, exposing one half of the equation in clear text by design is still really bad for security.

Two factor auth with one factor known to be compromised is actually only one factor. The same applies here.

But why only forced on MacOS?

I think some configurability would be great. I would like to provide an allow list or the ability to redact. Or exclude specific host groups.

We all have different levels of acceptable risk

btilly 4 days ago | root | parent |

Conspiracy theory time. Because Apple is the only OS company that has reliably proven that it won't decrypt hard drives at government request.

iml7 4 days ago | root | parent | next |

It depends on the country it is in, it rejects the US government's request. But it fully complies with any request from the Chinese government

throwaway48476 4 days ago | root | parent | next |

The venn diagram of users who don't want the government to access their data and crowdstrike customers is two circles in different galaxies.

EE84M3i 4 days ago | root | parent | prev |

I'd be interested to learn more about that.

My mental model was that Apple provides backdoor decryption keys to China in advance for devices sold in China/Chinese iCloud accounts, but that they cannot/will not bypass device encryption for China for devices sold outside of the country/foreign iCloud accounts.

It's probably being run on an enterprise-managed mac. The only person who can be locked out via encryption is the user.

vagrantJin 4 days ago | root | parent | prev |

This is a true conspiracy .

jordanb 4 days ago | root | parent |

Seriously? Crowdstrike is obviously NSA just like Kaspersky is obviously KGB and Wiz is obviously Mossad. Why else are counties so anxious about local businesses not using agents made by foreign actors?

smolder 4 days ago | root | parent |

KGB is not even a thing. Modern equivalent is FSB, no? I'm skeptical. I don't think it's obvious that these are all basically fronts, as much as I'm willing to believe that IC tentacles reach wide and deep.

All SIEM instances certainly contain a lot of sensitive data in events, but I'm not sure if most agents forward all environment variables to a SIEM.

hello_moto 4 days ago | root | parent |

Agents don't just read env vars and send them to SIEM.

There's a triggering action that caused the env vars to be used by another ... ehem... Process ... that any EDR software in this beautiful planet would have tracked.

st3fan 4 days ago | root | parent |

No it logs every command macOS runs or that you type in a terminal. Either directly or indirectly. From macOS internal periodic tasks to you running “ls”.

[deleted]

The certificate private key is also a secret.

> Because this has been the case forever with all security and SIEM tools.

Why?

There is no need to send your environment variables.

gruez 4 days ago | root | parent |

Otherwise malware can hide in environment variables

llm_trw 4 days ago | root | parent | next |

Ok, suppose you're right.

Why are they only doing it for macs then?

batch12 4 days ago | root | parent | next |

I don't think this is limited to just Macs based on my experience with the tool. It also sends command line arguments for processes which sometimes contain secrets. The client can see everything and run commands on the endpoints. What isn't sent automatically can be collected for review as needed.

st3fan 4 days ago | root | parent |

It does redact secrets passed as command line arguments. This is what makes it so inconsistent. It does recognize a GitHub token as an argument and blanks it out before sending it. But then it doesn’t do that if the GitHub token appears in an env var.

st3fan 4 days ago | root | parent | prev |

It may depend a bit on your organization but I bet most folks using an EDR solution can tell you that Macs are probably very low on the list when it comes to malware. You can guess which OS you will spend time on every day ...

llm_trw 4 days ago | root | parent |

So because macs are not the targets of malware ... we're locking them down tighter than any other system?

namaria 4 days ago | root | parent |

No, see, they're leveling the playing field by storing all secrets they find on macs in plaintext

Malware can hide in the frame buffer at modern resolutions. They could keep a full copy of it and each frame transition too.

worik 4 days ago | root | parent | prev |

They do not need to take the data off the computer to do that

What do you think grants the access to the infra or ability to get a certificate?

Arbitrary bad practices as status quo without criticism, far from absolving more of the same, demand scrutiny.

Arbitrarily high levels of market penetration by sloppy vendors in high-stakes activities, far from being an argument for functioning markets, demand regulation.

Arbitrarily high profile failures of the previous two, far from indicating a tolerable norm, demand criminal prosecution.

It is recently that this seemingly ubiquitous vendor, with zero-day access to a critical kernel space that any red team adversary would kill for, said “lgtm shipit” instead of running a test suite with consequences and costs (depending on who you listen to) ranging from billions in lost treasure to loss of innocent life.

We know who fucked up, have an idea of how much corrupt-ass market failure crony capitalism could admit such a thing.

The only thing we don’t know is how much worse it would have to be before anyone involved suffers any consequences.

Most sane SIEM engineers would implement masking for this. Not sure if CS still uses Splunk but they did at one point. No excuse really.

kmacdough 4 days ago | root | parent | prev |

"Oh, but our system is so secure, you don't need other layers."

bitcharmer 4 days ago | prev | next |

Another company that got MBA-ified

jokoon 4 days ago | prev | next |

We need laws and regulations on software the same way we have for toys, cars, airplanes, boats, buildings.

This silicon valley libertarian non sense needs to stop.

monksy 4 days ago | prev | next |

No shit.

tonetheman 4 days ago | prev | next |

[dead]

known 4 days ago | prev |

[dead]