This argument has held progressively less weight since 1995, when Intel released...

umanwizard · on May 31, 2019

All the moving pieces in Intel x86-to-RISC decoding (instruction decoder, μop cache, Microcode Sequencer ROM...) use up a non-trivial amount of silicon and power.

acdha · on May 31, 2019

Yes, it’s more than zero but you have to look at that as part of the entire chip budget. For example, this USENIX paper estimates 3-10% on power:

https://www.usenix.org/system/files/conference/cooldc16/cool...

That’s not nothing but usually when people talk about this the rhetoric assume it’s much greater and that e.g. ARM doesn’t have similar issues supporting its older instructions, albeit at smaller scale. If you look at the results, and a couple decades where everyone else was struggling to match X86 on either performance or non-embedded power efficiency, it clearly wasn’t holding them back that much. Even Intel’s huge moon-shot clean architecture failed to outperform despite starting with considerable experience and no legacy baggage.

makapuf · on May 31, 2019

I'm not sure. If that were the case why wouldn't intel expose a native, better fit to uops instruction set in addition to the legacy, difficult to decode one and let the apps use the newer ones (or the subset of existing ones which do map well)?

umanwizard · on May 31, 2019

My guess is (1) inertia, and (2) not wanting to commit to a specific instruction set because they tweak the internal μop set with every release.

And (3), this would add complexity, because, even though one is much simpler, now you need two completely separate decoding pipelines and mechanisms for switching between them.

makapuf · on May 31, 2019

Can't argue with 1. For 2 it's still an instruction set separate from uops so you might not be as sensitive to changes you still get a level of indirection. For 3 .. it depends. You might gain in power if you use the newer one more and you might as well make the older one simpler to achieve 90% of speed maybe. But maybe the decoding of instructions that counts is not that expensive compared to OOO branch predictors and 512 bits ALUs

Const-me · on May 31, 2019

Micro-ops aren't RISC. Instructions like VFMADD132PS perform a dozen of math operations, combined with RAM access, yet decode into a single micro-op.

acdha · on May 31, 2019

Note that I wasn’t arguing that Intel had switched from one textbook architecture to another: only that it isn’t really an accurate way to discuss modern chips after decades of large teams of smart people have been borrowing each other’s ideas.

As to your specific example, ARM has instructions which do complex operations as well. Does that mean it’s not a RISC CPU, or just that some engineers made a pragmatic decision to support things which are done heavily like AES or SHA?

Const-me · on May 31, 2019

> it isn’t really an accurate way to discuss modern chips

I think it’s still mostly accurate, CISC/RISC is about public API i.e. instruction set. What’s inside a core is implementation details, very interesting ones and can be important for performance, but still.

> ARM has instructions which do complex operations as well

True, but I don’t think they combine these complex math operations with RAM access, in a single instruction?