+1, however, from what I read, the vulnerability can only be exploited if the attacker has network access to the salt masters port, which should never occur. The people that got compromised had Salt exposed to the Internet, which is obviously ridiculous.
Not trying to downplay the critical nature of the vulnerability but the ones that were compromised by this issue have deeper security issues to deal with.
> has network access to the salt masters port, which should never occur
You seem to prescribe to the "hard shell soft gooey center" network security philosophy. Should people expose an Oracle server to the internet? Absolutely not. Does moving it behind a firewall change the fact that every mildly skilled exploit developer is sitting on an Oracle 0day? Absolutely not.
People have legitimate reasons for exposing Salt to the internet. I do. It's how I bootstrap random VMs and bare metal from the internet. But in my case the attack was mitigated by the fact that Salt cascades changes in a bunch of other systems and re-masters minions to a host only reachable over a tunnel. I blew away the internet master, restored from a backup, and patched.
> the ones that were compromised by this issue have deeper security issues to deal with
Or it was just another Monday. When you become sufficiently large you deal with incidents on a daily basis. Kudos to the people who publicly postmortem and talk about what went well and what didn't.
(For the record, I've already been working for a few months on a move to Ansible for non-security reasons)
> People have legitimate reasons for exposing Salt to the internet. I do. It's how I bootstrap random VMs and bare metal from the internet.
I question that that is a legitimate reason to expose it to the internet.
Defense in depth is a thing and putting the keys to the kingdom at layer 0 doesn’t seem wise even if a vpn or bastion doesn’t offer perfect protection.
Read the sentence after the ones you quoted. The internet connected salt master is used to provision accepted hosts in to the tunneled (VPN) network where the real master lives.
Twice I encountered breaking changes between versions that required manually upgrading minions. I also got the overall feeling Salt was built by developers, Ansible by sysadmins - and I fit into the latter bucket.
Ansible (originally known as "Fedora Unified Network Controller" or "func") was made for solving the problems automating Fedora Infrastructure.
Puppet did not make Fedora Infrastructure administrators happy. So func was designed around solving their problems, and expanded its scope as people found it useful. Then it was renamed to Ansible, the developers left Red Hat to create AnsibleWorks, and the rest is history!
That is exactly what the internet connected Salt master does. It bootstraps enough control that I can get the tunnels and keys properly configured, and the other 95% takes place once it is switched to a protected Salt master.
+1 agree but exposing salt to the internet is not the problem. A simple ip whitelist ingress firewall rule on the salt master port would have helped, blocking access is also possible on this port. With cloud services it has become trivial to group server resources so that when they belong to the same group they can communicate with each other. I don’t use salt however i am not a proponent of network isolation as a form of security.
I was in this situation; I went with “salt master exposed to the internet” because it’s the only service on that box - if I’d wrapped it in a VPN, then I’m replacing one exposed service with a different exposed service, and VPNs aren’t immune to exploits either (plus an extra layer of configuration means an extra layer of things that can go wrong)
If they wrote software which should never be visible to the internet, they should have made that clearer.
It's far too easy to make something internet-visible. They could have set up a simple check to see if the service is internet, and refused to work if it was.
SSH bastions and VPN are two standard ways to allow external clients access into an internal network, meaning salt is never exposed publicly.
I read this as a guideline that the salt master must not be exposed to the internet. Albeit could be better worded for a developer audience who doesn't understand bastions or VPN well.
> one week's notice between the initial announcement and the patch coming out. The patch being released is basically a disclosure of the vulnerability
While your other points may be valid, one week should be plenty of time between announcement and patch. Any longer and i would call the time table problematic.
That sounds like an old corporation problem. The world should not pay the price of old corporations inflexibility.
If someone hacks your system you certainly wont have a week to respond. The longer a vendor sits on a vuln, the more likely it is to leak or to be rediscovered by a malicious party.
The intruders had root access to every server in a salt deployment for who knows how long and yet everyone is claiming there's no evidence that any data or secrets (customer's or otherwise) were exfiltrated from the network. This is a very dangerous assumption. Nobody has any idea what was run on the servers since it seems that once the initial attack script was deployed it downloaded and executed new scripts every 60s and then removed themselves. Pretty standard C&C ops. It may have started as a mining operation, but that doesn't mean it was the only thing it was doing.
> ... and yet everyone is claiming there's no evidence that any data or secrets (customer's or otherwise) were exfiltrated from the network.
A number of people have carefully reviewed the payload that was deployed to servers, especially during what we're calling v1-v4 of the attack. (v5 onwards got more complex, but that wasn't until Monday (with variability for timezone).
> Nobody has any idea what was run on the servers ...
Well that's not true - there's a number of victims that have useful IDS tools, including auditd, plus the review of binaries and shell scripts deployed, etc.
Some of us also have netflow collection at the edge, and can review connections initiated from within our networks.
> ... once the initial attack script was deployed it downloaded and executed new scripts every 60s and then removed themselves.
I don't think any of us have found scripts that removed themselves. While that may sound naive, there's a few researchers that have been analysing these tools, including via large honeypot networks, and this just hasn't (at least for the first 2-3 days) been a profile of the attack.
Thankfully - and I appreciate it's very weird to say this - the initial attacks were very much vanilla crypto currency mining opportunities. It could have been a lot worse, and algolia's assessment matches a lot of other independent assessments on this front.
I hope for everyone's sake that it was just a naive crypto mining operation. But given the length of time this vulnerability was available, and the extent of access it allowed, I just find it very hard to say with any certainty that we know everything that it was doing. Exploits like this get passed around in nefarious circles pretty regularly. One of the scripts I saw went to great lengths to eliminate competing crypto miners from the systems so they could run their own. That tells me there were multiple people (or groups) exploiting this in competition with each other.
You said the v5 of the attack got more sophisticated. How do we know there wasn't a "v0" that was even more sophisticated and innocuous? You can't trust the server logs. Firewall tables were flushed, SELinux was disabled. It's just really hard to say the full extent of damages.
You're absolutely right that we can't be 100% confident, and best practice dictates a full rebuild from known sources, as usual after IOCs especially of this magnitude.
However, the number of public and non-patched salt servers might be considered a sufficiently small volume for bad actors to have investigated, who can say why it took so long to see genuinely malign attacks.
> One of the scripts I saw went to great lengths to eliminate competing crypto miners from the systems so they could run their own. That tells me there were multiple people (or groups) exploiting this in competition with each other.
It wasn't very sophisticated - just a series of kill statements. This tells me that the author of that script picked up an existing script that's probably been around for years and adjusted it to their needs.
The script also tried to kill confluence, amongst a handful of other large, relatively rare applications, which further suggests this was old fashioned copy-pasting by some non-sophisticated script kiddies ... or someone just wanting to do a PSA and draw attention to this exploit, and making a few BTC for their troubles. Who can say.
We don't know there wasn't a 'v0' - but we're fairly confident. Unless it was disabled as soon as 'v1' popped up, you'd expect honeypot systems to identify non-benign variants - and honeypot systems were identifying modest, reversible changes and nothing in the way of data exfiltration.
By Tuesday or Wednesday of this week I expect there were more (and worse) exploits than could be tracked, though, and some people are really going to suffer as a result.
I'll try to give you some insight as I'm a security engineer at Algolia.
Your concern is valid, and it's true, we cannot know for sure. That's the reason why, as explained in the blog post, we are reinstalling all impacted servers and rotating our secrets. If our assumption is false, this should contain the issue.
That being said, we have good reasons to make that assumption.
- Our analysis of the incident and how the malware behaved on our systems didn't find any evidence towards access and transfer of data.
- There are other public analysis of the malware. Other companies hit have the same analysis than us, and you can have a look at https://saltexploit.com/ which is maintaining an interesting list of what is known on the attack, how it behaved, and how it's evolving fast to adapt.
I agree. I would like to seem more details of how they determined it was only crypto mining. Finding only mining scripts in your logs doesn't mean they were not running other code once they had root.
It seems bizarre to me that a crypto miner got in. It wouldn't make much money on regular CPUs, and the high processor usage would immediately draw attention. So it looks like a low-effort botnet, which is embarrassing to get pwned by.
(The coin mining could be a cover like you mention, but it seems unlikely since it naturally draws attention.)
It's weird that these salt master are reach-able from internet and they can sleep well with it.
Even with zero-trust network or beyondcorp idea, I still found one extra layer of protection a VPC give are so great. Few years ago, it has an issue with K8S API Server, and updating k8s isn't a walk in the park. I felt relax back then because we have everything inside VPC.
You can use SSH or VPN to access service inside VPC. But any of tools that had permission to manage your infrastructure should never expose to the internet.
Same thing with Jenkins, if you are using Jenkins to manage Terraform or trigger Ansible/Salt/Chef run, make sure Jenkins is not reachable from internet. Using different method to route webhook into it.
I never understood the current trent to say VPN is a thing of the past. Redundancy in security layers is how you dont't get affected by every CVE out there.
Imo this is THE lesson to learn from this story.
Seondary: salt and ansible are not very mature yet.
Salt is definitely immature (been using it for 5 years and the situation has actually gotten worse in that time) but Ansible is a weird thing to group.
Yeah, I completely agree and really don't see the point of having a Configuration Management server facing Internet and basically having all your servers connect to it through the Internet! One thing is BeyondCorp idea to eliminate the roadwarrior concept and another is having your infra management exposed to CVEs in the wild!
For Jenkins it's a bit more complicated because GitHub webhooks although they do publish their IPs in a programmatic form so you can whitelist them.
> I used to rely on Github IP whitelist but one day i realized anyone can hit my Jenkins use Github.
That's a really good point but I guess you are talking about Actions egress right? Webhook in theory have dedicated IP ranges [1] and I think they are not shared with Actions egress, although TBH I haven't tested it.
> Why would anyone have salt ports open/exposed to public/internet?
If you're bootstrapping random servers, this is a fine approach.
The whole Salt connection methodology is 'trust on first connect' (a bit like the default SSH) with a manual stage in accepting an incoming request and the connection stream is encrypted.
If you're using salt to bootstrap your VPN servers or network appliances then it's understandable that you'd have it exposed to a more public network, and the documentation was clear that this was fine.
Not everything is a virtual machine on a cloud provider.
Kind of a tough situation. I personally wouldn't be ready to accept this is the last such vulnerability that will be found.
In light of this attack, maybe going forward have a setup script that creates an SSH tunnel back to a machine that can talk to the salt-master for you. You could then have VPN, but if it's flakey at all, it could cost the ability to update machines.
Or perhaps (and I say this as a saltstack user) ansible really is the more secure model for those scenarios.
> If you're bootstrapping random servers, this is a fine approach.
Define "random". I think there is an alternative method not involving exposing you CM server on the Internet for almost any definition of random. In the Algolia case it's pretty sure because they now filter the access by IP (so they KNOW the IPs)
"Random" can mean "I don't know before I start my instance".
If you're multi-cloud (vultr, DO, AWS and GCP) you almost certainly will not know your instances IP before it's provisioned and you can't make use of nice features like network tags or security labels.
If you're producing test environments then bootstrapping those is going to be significantly more painful than just opening up your salt-master and running an authenticated API request to allow those new machines.
As other people have mentioned, this was always supposed to be /possible/ it's akin to SSH. Sure, you can avoid some log spam and potential issues by firewalling it off- but it's meant to be possible to run it publicly, it has always been marketed this way so it's not "insane" that people did it.
> As other people have mentioned, this was always supposed to be /possible/ it's akin to SSH. Sure, you can avoid some log spam and potential issues by firewalling it off- but it's meant to be possible to run it publicly, it has always been marketed this way so it's not "insane" that people did it.
I'm not blaming anyone, I'm just saying that if you put well-known software facing the Internet you are exposing yourself to more risks than not putting them on the Internet. And for a core infra software as SaltStack I don't really see a good reason to justify it. I don't justify either putting SSH publicly accessible unless you are a really, really small company or an individual.
In a multi clouds setup, all the clouds are joined together with site to site VPNs. One doesn't just do a setup where they're public and connect to one another database over the public internet.
That's easier said then done. There are no simple cross cloud provider solutions for a private networking other then ZeroTier, which has it's own issues.
Trusting a central control server is the fundamental mistake here.
It creates a very high value target that is difficult to secure.
I prefer a model where the management commands are signed at a management workstation and those commands are pushed by the server and authenticated at the managed node against a security policy.
Both this and the ghost cms updates seem to hint that the only reason this was discovered was the fact that loud crypto miners were exhausting resources. What are the chances a more quiet attacker hasn't thoroughly ploughed through the entire infrastructure days ahead?
Also think about how many years this vuln has been present and exposed. Who's to know blackhats haven't sat on this 0day for years, quietly compromosing private keys and other data? Spooky.
I've seen mentioned in the comments various "deployment" tools (or call them "configuration management" if you will) being called "insecure" or "immature", or one being claimed better than another; however I think this is a good opportunity to talk about a deeper problem, namely the architectural choices each tool has taken.
These choices all impact the reliability and security of the resulting system, especially the following:
* do they rely on SSH, or they have implemented their own authentication / authorization techniques? (personally I would be very reluctant to trust anything that just listens on a network port for deployment commands, and it's not SSH;)
* do the agents run with full `root` privileges, or is there a builtin mechanism that allows the agent to act only in a limited capacity, within the confines of a set of whitelisted actions? (perhaps even requiring a secondary authentication mechanism for certain "sensitive" actions, for example something integrated with `sudo`, that provides a sort of 2-factor-authentication with a human in the loop;)
* do the operators have enough "visibility" into what is happening during the deployments? (more specifically, are the deployment scripts easily auditable or are they a spaghetti of dependencies? are the concrete actions to be taken clearly described, or are they hidden in the source code of the tool?)
* are there builtin mechanisms to "verify" the results of the deployments?
* and building upon the previous item, are there mechanisms to continuously "verify" if the deployment hasn't changed behind the scenes?
I understand that some of these features wouldn't have helped directly to prevent this particular case, however it would have helped in alerting and diagnosis.
Can anyone describe the business benefits of an algolia implementation (vs Elasticsearch?) for a company that doesn't heavily rely on content searches? It seems expensive and something that I'd build on my own.
(Disclaimer: long-time operator and fledgling programmer)
IMHO the two main advantages in favor of Algolia, are the sane defaults for relevancy and speed and the fact that the service is hosted and can grow with your business without having dedicated engineers to manage both the configuration and the infrastructure.
Also, on top of the Algolia services per se (search, analytics, recommendation, etc.), we're providing a lot of backend and frontend libraries which one would otherwise need to reimplement when using an elastic- or Solr-based implementations.
Search is hard to get right and the cost of Algolia is negligible vs. doing it yourself. As a programmer, every line of code you write is a line of code you own: the less code you own in production, the better off you are. Algolia has saved us hundreds of hours which translates to tens of thousands of dollars.
As a point of comparison, you can also expose Puppet masters to the public Internet but Puppet is using HTTP/HTTPS as a transport, so it is trivial to put a reverse proxy in front of it, requiring a valid certificate (managed and signed by Puppet) to contact the service. This way, no need to maintain a whitelist of legitimate clients.
- the notification was a week ago to a small mailing list, which is tucked away on their site
- no notification to the registry to when you go to download salt (at least I never received an email, but still get plenty of marketing spam)
- no posts on social media as far as I can tell, I couldn't find a tweet, anything on reddit, or anything on hn.
- they only blogged about it on their official site yesterday, way after damage had been done
- one week's notice between the initial announcement and the patch coming out. The patch being released is basically a disclosure of the vulnerability
- the patch was released late Thursday early Friday depending on your timezone, giving attackers the weekend head start
- the official salt docker images were only patched yesterday
- You can't get a patch for older versions without filling out a form and supplying details
- Ubuntu and other repositories are still vulnerable