>but would RISC-V help in this case No. All the instructions get turned into uop...

snvzz · on April 20, 2023

>Supporting x86 is an advantage.

Compatibility-wise, there's no doubt. The software ecosystem is most broad on x86.

But x86 is really bad at reliability / high assurance due to its complexity and due to obtrusive (SMM mode), non-auditable firmware.

charcircuit · on April 20, 2023

>really bad at reliability

Yet, the most reliable services people use every day run on x86. It's not "really bad".

>due to its complexity

I don't see the complexity of the instruction set causing downtime. Compilers abstract 99% of it away.

>due to obtrusive (SMM mode), non-auditable firmware.

I've never seen SMM mode cause down time, nor have I had the non-auditability of the firmware caus me down time. These are low level things that are a part of how the CPU works. When you are working at a high level you can just ignore them for the most part. For reliability you are always going to need to be able to handle bad chips due to manufacturing defects once you are operating at scale.

pietrushnic · on April 20, 2023

> I've never seen SMM mode cause down time, nor have I had the non-auditability of the firmware caus me down time.

Hyperscalers (OCP) pushing for coreboot and LinuxBoot have probably different experience. AFAIK they hate SMM especially with SMI handlers coming form unknown source.

Not saying SMI latency is huge problem in industrial applications like CNC.

snvzz · on April 20, 2023

>Yet, the most reliable services people use every day run on x86. It's not "really bad".

When reliability is paramount (e.g. pacemakers), x86 is naturally avoided.

>I don't see the complexity of the instruction set causing downtime. Compilers abstract 99% of it away.

Complexity of the ISA affects the whole system. It breeds bugs not just in hardware, but also in operating systems and toolchains. These can affect reliability.

>I've never seen SMM mode cause down time

I have. With SMM, the rug is pulled under the OS's feet. Your CPU can be taken away at any time, for any unexplained reason, without prior warning, and for an undetermined amount of time.

I have seen SMM cause spikes of latency breaking pro audio pipelines, and I have seen SMM grab a CPU and not return it.

pietrushnic · on April 20, 2023

> due to obtrusive (SMM mode), non-auditable firmware

Let's be honest every architecture has the same problem of parallel "trusted" execution environment, some have even more than one.

- ARM TrustZone

- POWER SBE

- RISC-V SBI

snvzz · on April 21, 2023

I wouldn't say SBI there belongs with the others.

For starters, it is a fairly simple, open specification of an interface[0], with an open implementation, opensbi, that so far everybody uses.

Furthermore, it is not "hidden" in any way from the OS, which can take over its roles, partially or completely.

0. https://github.com/riscv-non-isa/riscv-sbi-doc/releases

pietrushnic · on April 21, 2023

I'm not a RISC-V expert, primarily relying on open-source firmware community knowledge and the opinion of such figures as Ron Minnich, but AFAIK, most firmware for production RISC-V deployments is closed-source, so we can't say if vendors use OpenSBI implementation or some modified version for their malicious purposes. If I need to be corrected, please point me to products that state how they transparently use SBI. There is evidence of transparent use of SMM in coreboot. SBI is not the only RISC-V TEE. There are also other concepts, like MultiZone [1] and PMP-based Keystone [2], advertised as trusted execution environments

The openness of specification only matters a little here - UEFI specification about management mode is also open. But IBVs screwed us many times [3] by implementing low-quality SMIs that could elevate privileges, and the spec does not guarantee that implementers follow it. Lack of tooling makes compliance difficult - it slowly changes with various startups seeing that as an opportunity (e.g., Binarly, Eclypsium?).

My point is that SMM as TEE is not a unique x86 feature, and other significant architectures have similar mechanisms. OS has a means of figuring out that it was in SMI, so it is not entirely hidden but still has superpowers. In the CNC world SMI latency is measured from OS [4].

Trusted execution environments are hammers and can be used for a good purpose in a transparent way and for malicious purposes. Further typically depends on the trustworthiness of mechanisms implemented and used by vendors, but thus keep the fact that TEEs and peripheral MCUs are everywhere, which may lead to the extended attack surface.

Why do we see those TEEs, peripheral MCUs, everywhere? I like the explanation in this lecture [5]. No architecture can quickly fix that problem.

[1]: https://hex-five.com/multizone-security-tee-riscv/

[2]: http://docs.keystone-enclave.org/en/latest/Getting-Started/H...

[3]: https://research.nccgroup.com/2023/04/11/stepping-insyde-sys...

[4]: http://wiki.linuxcnc.org/cgi-bin/wiki.pl?FixingSMIIssues

[5]: https://youtu.be/36myc8wQhLo

snvzz · on April 21, 2023

There's little RISC-V could do to prevent bad SMIs.

It already did well enough by standardizing SBI and by providing a high-quality open implementation of it.

This minimizes (or even removes) incentives to provide a proprietary solution.

Some vendors will of course do whatever they want, instead of just using opensbi.

But they'll be opting out of being compliant with the platform specs, and of benefiting from the support for SBI present in operating systems and embedded toolchains. Such an implementation would just make themselves and everybody else miserable.

In an ideal world, the market will avoid non-compliant SoCs. In practice, there will be some of these to point at as examples of how not to operate.

okanat · on April 20, 2023

You're mixing implementations and the ISA. A RISC-V chip can be as daunting as any other x86 and I can assure you they will be.

Those modes are one of the ways to enforce export controls. Any competitive chip has something similar, say ARM TrustZone or whatever x86 manufacturers name their security mode.

If you don't put such features, the Asian manufacturers will instantly copy your IC.

transpute · on April 20, 2023

> non-auditable firmware

APU2 ships with open-source coreboot firmware and DRTM-capable silicon for measured OS launch with TPM.

charcircuit · on April 20, 2023

The firmware etched into the CPU is not open source, nor is the firmware loaded by coreboot or your operating system.

transpute · on April 20, 2023

What's a commercial RISC-V SoC that enables an OS to run with zero dependencies on binary blobs?

charcircuit · on April 20, 2023

I don't know, nor do I care about running an os without binary blobs.