From: https://talawah.io/blog/extreme-http-performance-tuning-one-point-two-mill...

salmo · on July 4, 2022

For regulatory compliance, that’s still not acceptable because it opens the door to cross user data access or privilege escalation.

If someone exploited a process with a dedicated unprivileged user, had legit limited access, or got in a container on a physical, they might be able to leverage it for the forces of evil.

There’s really no such practical thing as single user Linux. If you’re running a network exposed app without dropping privileges, that’s a much bigger security risk than speculative execution.

Now, if you were skipping an OS and going full bare metal, then that could be different. But an audit for that would be a nightmare :).

flatiron · on July 5, 2022

There’s definitely sneakernet Linux boxes. I’ve worked at a bunch of places with random Linux boxes running weird crap totally off the actual network because nobody was particularly sure how to get those programs updated. Technical debt is a pita!

hedora · on July 5, 2022

I suspect you are conflating security regulations for Unix users with regulations targeting users of the system.

Why would a regulatory framework care if a Linux box running one process was vulnerable to attacks that involve switching UIDs?

Converse, why would that same regulatory framework not care if users of that network service were able to impersonate each other / access each others’ data?

salmo · on July 5, 2022

Most of the controls are about auditability and data access.

But the control frameworks are silly sometimes. Then add in that they’re enforced by 3rd party auditor consultants looking for any reason to drag it out.

And yeah, I tried to get this past them for a old singleton system to avoid having to buy a bigger non-standard server.

throwawaylinux · on July 5, 2022

Which regulations?

salmo · on July 5, 2022

The worst is the fake regulation that is PCI. But I tried in a SOX audit to avoid buying new gear.

I hate auditors. Making sense doesn’t matter. Their interpretation of the control does. I do have a playbook of “how I meet X but not the way you want me to”, but lost that one. Probably spent more $ arguing than the HW cost.

toast0 · on July 5, 2022

You kind of have to consider the whole system. Who has access, including hypothetical attackers that get in through network facing vulnerabilities, and what privilege escalations they could do with the mitigations on or off.

I've ran production systems with mitigations off. All of the intended (login) users had authorized privilege escalation, so I wasn't worried about them. There was a single primary service daemon per machine, and if you broke into that, you wouldn't get anything really useful by breaking into root from there. And the systems where I made a particularly intentional decision were more or less syscall limited; enabling mitigations significantly reduced capacity, so mitigations were disabled. (This was inline with guidance from the dedicated security team where I was working).

fulafel · on July 5, 2022

Browser tabs from different origins are equivalent of the multi-user access here, at least in cases where the speculative execution vulnerability can be exploited from JS. Same for other workloads where untrusted parties have a sufficient degree of control on what code executes.

Generally.. depends on what you mean by generally. In the casual sense of the word, speculative execution attacks are not very common, so it can be said that most people are mostly safe from them independent of mitigations. Someone might also use "generally safe" to mean proven security against a whole class of attacks, in which case the answer would be no.

urthor · on July 5, 2022

It never happens.

Anytime the discussion around turning off mitigations comes up, it's trumped by the "why don't we just leave them on and buy more computers" trump card.

An easier solution than mitigating is to just upgrade your AWS instance size.

dsp · on July 5, 2022

> It never happens.

Many intentionally run without mitigations.

staticassertion · on July 5, 2022

As far as I am aware, the capability required to exploit Meltdown is compute and timers. For example, an attacker running a binary directly on the host, or a VM/interpreter (js, wasm, python, etc) executing the attacker's code that exposes a timer in some way.

If you just have a dumb "request/response" service you may not have to worry.

A database is an interesting case. Scylla uses CQL, which is a very limited language compared to something like javascript or SQL - there's no way to loop as far as I know, for example. I would probably recommend not exposing your database directly to an attacker anyways, that seems like a niche scenario.

If you're just providing, say, a gRPC API that takes some arguments, places them (safely) into a Scylla query, and gives you some result, I don't think any of those mitigations are necessary and you'll probably see a really nice win if you disable them.

This is my understanding as a security professional who is not an expert on those attacks, because I frankly don't have the time. I'll defer to someone who has done the work.

Separately, (quoting the article)

> Let's suppose, on the one hand, that you have a multi-user system that relies solely on Linux user permissions and namespaces to establish security boundaries. You should probably leave the mitigations enabled for that system.

Please don't ever rely on Linux user permissions/namespaces for a multi-tenant system. It is not even close to being sufficient, with or without those mitigations. It might be OK in situations where you also have strong auth (like ssh with fido2 mfa) but if your scenario is "run untrusted code" you can't trust the kernel to do isolationl

> On the other hand, suppose you are running an API server all by itself on a single purpose EC2 instance. Let's also assume that it doesn't run untrusted code, and that the instance uses Nitro Enclaves to protect extra sensitive information. If the instance is the security boundary and the Nitro Enclave provides defense in depth, then does that put mitigations=off back on the table?

Yeah that seems fine.

> Most people don't disable Spectre mitigations, so solutions that work with them enabled are important. I am not 100% sure that all of the mitigation overhead comes from syscalls,

This has been my experience and, based on how the mitigation works, I think that's going to be the case. The mitigations have been pretty brutal for syscalls - though I think the blame should fall on intel, not the mitigation that has had to be placed on top of their mistake.

Presumably io_uring is the solution, although that has its own security issues... like an entirely new syscall interface with its own bugs, lack of auditing, no ability to seccomp io_uring calls, no meaningful LSM hooks, etc. It'll be a while before I'm comfortable exposing io_uring to untrusted code.