Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Great CPU Stagnation (databasearchitects.blogspot.com)
246 points by greghn on May 18, 2023 | hide | past | favorite | 219 comments


This "stagnation" is nothing like the stagnation during AMD's poorly performing Bulldozer era (the post Athlon era) where they were consistently beat by Intel's offerings and there was a general lack of innovation in the prosumer space.

During that era for the most part Intel's i7 prosumer CPUs started with 4 cores with the Bloomfield Nehalem chips in 2008 (which at the time were awesome and a game changer) and ended with 4 cores with the Kaby Lake-S in 2017. It really only changed in 2017 with AMD Ryzen forcing Intel to actually increase core count.

2008 Nehalem benchmark: https://cpu.userbenchmark.com/SpeedTest/778/IntelR-CoreTM-i7...

2017 Kaby Lake-S benchmark: https://cpu.userbenchmark.com/Intel-Core-i7-7700/Rating/3887

When I compare the two, it shows an effective 20% speed increase, although microbenchmarks show a 50% increase. That is a stagnation.

During that era it felt like a lost decade. I don't miss it.


Not saying you're wrong, because intel stagnation was very real, but please don't use userbenchmark as a source of numbers for anything. They're a terrible source with biased benchmarks and reporting. Just look at the dribble they write for basically any AMD product on their site (https://cpu.userbenchmark.com/SpeedTest/1817839/AMD-Ryzen-7-..., or this https://cpu.userbenchmark.com/SpeedTest/2081998/AMD-Ryzen-7-... which just find-and-replaces half the 5800X3D comments).


Or when AMD started adding more cores to desktop chips and they changed the "overall score" calculation from 30% single, 60% quad, 10% multi to 40% single, 58% quad, 2% multi.

Not only was that the opposite of computing trends, and just there to spite AMD, the new system had results like an i3-9350KF being overall faster than an i9-9980XE.


Likewise, please don't use Cinebench. It's a poor general purpose CPU benchmark that first favored Zen architecture (many slow cores) and now favors Intel's Raptor Lake. It's hand optimized for x86 instructions and is highly parallel, which is not how most applications run.

Use Geekbench instead of Cinebench or Userbenchmark.


>It's a poor general purpose CPU benchmark that first favored Zen architecture (many slow cores) and now favors Intel's Raptor Lake.

Would you advise the same if the colors were reversed?

The majority of people here are AMD fanbois, so it's sometimes hard to discern whether something is rooted in objectivity or subjectivity.


If you're an AMD fanboy, you'd only care about Cinebench despite the fact that its results do not correlate with gaming or other common applications such as browsing, Excel, video editing, etc.

There's a reason why x86 CPU users will only want to compare Apple Silicon using Cinebench. It's because Cinebench uses Intel Embree, which is hand optimized for x86 instructions. It's like testing Ryzen or Core CPUs on software optimized for ARM instructions, and then making a conclusion on how fast they are for x86 applications.

Use Geekbench.


Ideally you would run a suite of various benchmarks and workloads and pass judgment on the overall results, rather than basing it on just one specific test or another. They're all going to be biased one way or another.

You also didn't answer my question: Would you likewise advise against Cinebench if the colors were reversed? As it is, you're advising against Cinebench just because it favors Intel.


I’m advising against cinebench for any general purpose CPU testing. Far too many people, YouTubers, and hardware websites draw conclusion about the speed of the CPU and perf per watt using only Cinebench.


I can drink to that. No singular test is going to give a complete picture.


Geekbench generally lines up with SPEC, the industry standard. (at least for consumer stuff)


What the hell is that write up? I never noticed that before but it makes userbenchmark look really bad.


Can confirm, I bought a pair of desktops for work in 2012 with Ivy Brige i7's that could "turbo boost" to 3.9 GHz indefinitely without overclock. I did not feel a real upgrade need until the 32 core Threadripper machines came out in 2018.


I only just upgraded from my i5 3570K to a Ryzen 7 5800X3D.

Not because anything I was currently doing with my computer was becoming too slow, but because I wanted to do new things (VR). A shame, I wanted to run that thing into the ground.

I've got my old system sat spare, I'm not sure what to do with it.


my i7-4790 served me nicely for 8 years. Only upgraded because now most games were CPU-limited, even on "only" GTX 1070. But I went "all in" "I don't want to upgrade for quite a long time" with 7800X3D. Maybe GPU upgrade in 2-3 years...


Did something similar a bit ago with the 5900X and a 6900XT. Wonderful setup, low wattage, quiet, performant.

Just lost it to a fire and not sure what to replace it with, alas. Too busy in the aftermath to justify building and parting stuff independently, so that probably leaves me with an off the shelf or a workstation + GPU combo.


Yikes. What started the fire?


Still under investigation, alas. First guess was a popped lithium battery downstairs but the fire dept has ruled that out now.


I’m sorry you went/are going through that.

I know you said it’s been ruled out, but I sometimes wonder if I should worry more about aging lithium batteries in older devices… the oldest ones (e.g. 2000s handhelds) are now old enough to be retro-cool and therefore worth keeping for nostalgia, but they’re also a bit scary.


Have you seen the recent Gamers Nexus videos on CPU overvolting and catching on fire? You should update your BIOS to prevent potential further damage.


That issue currently appears to be relevant only to the AM5 series boards and CPUs. The 5900X is an AM4 part.


I was replying to a comment mentioning a 7800X3D.


Nice processor choice... vrchat in mind? Haha


Yeah, but mostly Neos which loves all that L3 cache.


This is true. The Intel 4-core era lasted a staggering 10 years due to no competition.

Today, it seems like CPU competition is a live again with Intel, AMD, Apple, Ampere, ARM, Graviton, RISV, Qualcomm, etc.


Userbenchmark is fairly useless for CPU performance, though. Their results are often completely nonsensical.


the one good thing to come out of that era was the "moar cores" meme



he's wearing a silly hat because it's predated by this https://www.reddit.com/r/gaming/comments/tc4aw/make_more_hat...


Checking in as the awesome game changer.


The perfect time to shed ourselves of the idea that "optimisation is a waste of dev-time". Mobile computing was the last godsent t actually rethink performance a little bit, but we still have a lot of relatively low-hanging fruit. I sometimes dream about a month-of-no-new-features, where everyone would just have a bit of time to clean up and improve on existing stuff.


A more extreme version of that is “permacomputing”—https://permacomputing.net/projects/

I still regularly use old systems for fun. I grew up with Macs, so that’s the point of reference. When I use these old systems, I pine for a few specific things I’m used to on newer systems—but it almost feels like nitpicking.

The things that I really want from a computer are pretty basic. Like good, consistent copy/paste and drag ’n’ drop, good autosave, good file browser, that sort of thing. It seems like new, half-baked stuff got dropped in our laps before the basics really got perfected.


I am always more than happy to see Permacomputing mentioned. I really think it is a wave of the future that will not be fully realized for another decade or so. Similar to Solar punk, it is still trying to find its feet and a lot of the most optimistic propositions will look silly in retrospect but the potential is there.

It also kind of drives me up the wall when you see what was being done on computers in the 80's and 90's on sub 100Mhz processors and realizing just how much efficiency has been lost in the name of ease. Excel shouldn't need to use all 12 threads of my CPU and visibly take time to sort less than a 1MB of data but here we are.

I have been working on an essay regarding Permacomputing for the last few weeks and it can be kind of difficult to summarize at times. The closest I can get is that it is part retrospective about want worked in the past but with the direct goal of implementing the goals long term sustainable systems that do not require large external inputs.


2+ decade-old word processors were fantastic at their jobs, and did 98+% of whatever needs to be done today. It would be great if old codebases could be liberated, cleaned-up, and turned into all-but-perfect tools for the job. Older spreadsheets, databases, etc., could also be renewed this way — but I doubt the corporations holding the rights to them would be interested in making this sort of thing happen. Too little in it for them, especially as (ideally) spy/tracking hooks would not be included in the package.


Absolutely. This is why I feel like projects Libre office are disappointing to both optimists and pessimists. They are free and open but the performance and some capabilities are not so stellar.

Maybe there is the possibility to trim down a package like that into something akin to what we had in the 90's but, funnily enough, there is a lot of that 90's legacy that was stacked onto, it looks like there is just too much legacy in it to make that a viable path.


It's not as easy as it sounds. Making those apps work fast enough on hardware they were originally written on often involved highly convoluted optimized code, and sometimes even handwritten assembly. There is a valid point here, but I don't think reviving old codebases is a good starting point in many cases.


You're right on the ease of porting code. It's a shame that so much has been made over the years, and well, and seems to just disappear into the ether.


It's been a little frustrating watching iOS rip out literally all file management at launch and then slowly reintroduce it all feature by feature one at a time on iPad. Recently you can even mount network shares and use USB drives again.


Make Forth Great Again


It would be nice if every developer had to run their latest build on 10 year old hardware while testing it out.

Instead of their high-powered development machine with the latest CPU, tons of high speed memory, and the fastest SSD; they would get to experience what many of their customers have to endure on slower hardware with capacity constraints.

Nothing spurs optimization like seeing first hand how your code creeps along on slow hardware.


Don't say it too loud, friend of mine worked at a company where everyone had the same computer.

I5 Gen 3, 500g HDD and 8gig of ram.

Yes, outlook runs fine for the secretary, visual studio not so fine for debugging.


I wasn't suggesting that all the developers have to develop and compile their code on antiquated hardware. They should still have fast development machines.

What I was saying that if the developers ONLY run their software on their high-powered computers and never try it on slow hardware, they generally resort to the 'it runs fine on my machine' response when customers start complaining about performance.


I worked at a place like that, brought in RAM and SSD from home


*and* slow-ish internet.

We had few surprises when some newbie dev noticed that the site doesn't work quite as well outside of 1Gbit connection with 1ms ping to the app server...


>"It would be nice if every developer had to run their latest build on 10 year old hardware while testing it out."

That is exactly what I do with my desktop product. I test it on really crappy hardware first.


Supposedly the Office for Mac team had to run their builds on the original 60MHz Power Mac


There is that moment in video game development when the feature set is locked down and it basically turns into speed and stability optimization. Some people love it, others hate it.

I guess you would get a similar state in some embedded systems. The folks at NASA working on the Mars Rovers have a set target and are usually targeting mid 90's MIPS or PPC processor so they have to be fixated on speed and performance.


Smart contracts get there too. There is a huge monetary incentive for optimization and correctness at launch.


At least no one is using those.


This is why I loved every console pre-PS4, from a technical standpoint. The amount of performance (think God of War 2 or Black) devs were able to squeeze out of the PS2 was nothing short of staggering.


Because they are dirt cheap I recently bought a Xbox 360 and Ps3 and I agree, it is astounding even up til then just how much was squeezed out of those machines.

On the Ps3 side, I had completely forgotten just how hamstrung it was in terms of memory, the OS and the load times on games are SLOOOOOW! And yet, they got Uncharted 3 and Crysis 3 out of that thing.

One of the most impressive feats I have seen would have to be Daytona USA 2 in the arcades. That thing is running on a single 166Mhz PowerPC 603 and a Real3D GPU that still needed all the polygon setup done on the CPU. How they got that out of that hardware is just beyond me. It is optimised to the max!

https://www.youtube.com/watch?v=ttAz1mYTuv0


More like a year - the performance rot runs deep.


Considering that the NeXTStep OS ran on a 1120 × 832, 12 bit, display in 1990 with 12 MB of archaic memory and 1.5 MB of VRAM, I'd say more then a year. Judging by modern app memory usage.

Even with the latest and greatest 4k, HDR, wide colour, display, no app should ideally use more then 256 MB of memory by those standards, unless it's even more complex.


Recently watching a Gamer's Nexus video where they tested a new built Voodoo 6 5000, I was impressed watching them install some ancient edition of Windows in which to do the testing, and how snappy and responsive the interface was.


My first computer was a 95 Packard Bell. I was 12 years old at the time, so I might be misremembering, but I swear the interface responsiveness felt immediate.


Dan Luu did various tests on keyboard and terminal latency. Its not pretty. However a modern desktop have a compositor (since Vista, Linux since XFree86 -> X.org, and macOS since forever). This is essentially double screen rendering.


We might need to block all software features for the next decade so we can figure out what's going on.


The C memory model underlying all modern processors does not scale. Most of the time your cpu is sitting idle.


My favourite comparison is that the A16 chip can do 1 million times more operations per second compared to the top of the line System 360 mainframe.

That is to say, a theoretically ideal OS running on the A16 could run every mainframe program in existence until the late 1970s, simultaneously.


Ask your boss for that! Float the idea at a team standup or something a couple of times. Advocate for having engineering roadmap to address tech debt and increase performance. Many times you'll get it if you can articulate the need for it well enough.


Rather than a one-month focused effort, i think this should be an ongoing practice, much like security


OS and device manufacturers don't necessarily want to optimize. Optimization would reduce the hardware replacement cycle, cutting into their profits.

Microsoft gets pressure from their hardware partners to keep up the hardware replacement cycle, and they themselves of course get a cut of that via OS licenses.

Phone manufacturers seemingly invented "Always On Displays" to also cause people to start to think they needed a new phone or battery just a year/18 months into owning their device. Christ what a waste of power for such little value delivered! My wife said her new S23+ would barely last a day - so I turned off the always on display option, and now it lasts 2.5/3 daysish of normal usage.

It's a slightly different story with laptops, where battery life is an important feature. You'd think battery life would be important for phones too.


Huh, the always on display on my S23 non-plus was only a 10% battery hit over a day or so. Still disabled it, of course.

Also nice that they've finally added an option on Samsung phones to only charge to 85%, but it would be nice if it had smarter options like on iOS. Eg, an option to charge up to 100% just before you expect to wake up, so you have the full charge but it doesn't sit at 100% all night degrading the battery.


iOS has an Optimized Battery Charging option that does something like that. You can't set a schedule, though - it's based on your usage patterns.


I could be wrong on the model. It's one of the big ones with a pen. There's so many Samsung models...


I suspect this is why companies like MS are locking down OS releases to specific hardware in the name of security. Apple moving to Apple Silicon also means they can potentially lock down the hardware as they see fit.

Upgrades will be forced via software obsolescence rather than hardware performance.

This is why I think the Linux/Free-Libre software folks would do well to focus a bit more on optimizing performance as there is a lot of room for improvement that will not need to be forced onto people.


That's still the case, though.

Like it or not, computation is cheap and developers are expensive. Code isn't worth optimizing until it either becomes a bottleneck, or you are running it on thousands of machines.


> Code isn't worth optimizing until it either becomes a bottleneck

This is a well-known fallacy. There's no guarantee your performance problems have a single bottleneck. In fact, more often than not, your entire program is poorly thought and the only way to fix it is a full rewrite, with the associated risks.

> When I was teaching, I often used this metaphor: suppose you’re writing some system, you decide that you should avoid premature optimization, so you take the usual advice and build something simple that works. In this metaphor let’s pretend that your whole program is a sort. So you choose a simple sort that works. Bubble Sort. You try it out and it functions perfectly. Now remember Bubble Sort is a metaphor for your whole program. Now we all know that Bubble Sort is crap, so you have to eventually change to Quicksort. Hoare likes you more now. So how do you get there? Do you just, you know, “tune” the Bubble Sort? Of course not, you’re screwed, you have to throw it all out and do it over. OK, except the greater-than test, you can keep that. The rest is going in the trash.

> But you got valuable experience, right? No, you didn’t. Anything you learned about the Bubble Sort is worthless. Quicksort has entirely different considerations.

> The point here is that a small bit of analysis up front could have told you that you needed a O(n*lg(n)) sort and you would have been better served doing that up front. This does not mean you have to microtune the Quicksort up front. Maybe down the road you’ll discover that part of the sort (remember this is a metaphor) should be written in ASM because it’s just that important. Maybe you won’t. There will be time for that. But getting the right key choices up front was not premature. There is a suitable amount of analysis that is appropriate at each stage of your product.

https://ricomariani.medium.com/hotspots-premature-optimizati...


You're not describing the real problem there. It's perfectly fine for a company to throw a hundred servers at something to save dev time.

The problem is programs that are running on thousands or millions of machines but don't get optimized, because the customer runs the program and pays the costs and it's hard for them to blame any program in particular.


Exactly this bothers me a lot. I do bare metal performance optimizations, but I worry to never find a job outside of academia in that field. as I don't want to do high-frequency trading nor develop games.


You could look lower in the stack, e.g. at compilers, drivers, and libraries for GPUs and other accelerators. That's not a huge field, but still a decent number of jobs to go around.


Depends on how you look at it. It’s the mindset. The way you worded it sounds like you default to writing poor code and then have to reach a certain bar before it requires optimization.

Can’t you write closer to optimal code to begin with? With experience can’t you start with learnings of the past? A lot of things are easy and convenient to do now but people like you just parrot about not needing to optimize so as to not do it at all.

Maybe instead of thinking if it’s expensive or cheap think about actually wanting to do something decent? Or is everything fake and just a transaction?


We've known about this for a long time. Everyone expected it to happen. There are some key upcoming technologies that have the potential to cause a step in scaling (CFETs, backside power delivery) but it's still not going to be anywhere near Moore's law levels. I think this is part of why GPU power is skyrocketing and why Apple, Qualcomm, and the like are trying to shift towards services.


> I think this is part of why GPU power is skyrocketing and why Apple, Qualcomm, and the like are trying to shift towards services.

IMHO it's only a very small part of why GPU power consumption is going up. The main reason is the completely unnecessary chase for the performance crown.

From personal testing: my GPU manages to get 95% of its peak performance while being power limited to 80%. So the in order to squeeze the last 5% of performance out of the device, 20% more power is pushed through it. It stays above 99% peak performance while being power limited to ~87%.

But even just looking at the raw numbers paints a different picture. About 12 years ago, a high-end GPU (e.g. GTX 480) had a power draw of 250W at a theoretical peak FP32 performance of 1,345 GFLOPS. This year's RTX 4070 has a theoretical peak performance of 29.15 TFOPS at 200W, so we went from 5.38 GFLOPS/W to 145.75 GFLOPS/W in 12 years - a 27x improvement in efficiency and a ~22x improvement in raw performance.

Now let's compare that to the numbers from a decade ago: a GTX 580 from 2010 had a power rating of 244W at 49.41 GTexel/s. A Geforce2 Ultra from 2000 used about 10W at 2.0 GTexel/s. So we went from 0.2 GTexel/s/W to - you've guessed it - 0.2 GTexel/s/W, so same efficiency with a ~27x increase in performance over a decade, though the efficiency is only a guess, since neither GFLOPS nor official power draw figures are readily available for 2000-era hardware.

Fast forward a few years so we can get reliable power draw numbers and comparable performance in GFLOPS, we have the high end GeForce 8800 GTX at 155W for 345.6 GFLOPS in 2006. Ten years later, the comparable model would have been the GTX 1080 from 2016 with 180W at 8.873 TFLOPS. So 2.2 GFLOPS/W versus 49.3 GFLOPS/W or a 22x increase in efficiency and a ~26x increase in performance over the course of a decade.

So during the past 23 years, power efficiency steadily improved, while raw performance increase also showed no signs of slow down in the GPU space. This is given the same generous time frames, to account for the occasional generational leap.


I would also think that GPU workloads have something to do with it. Almost none are serial workflows, and instead highly parallel work.

GPU workloads will eventually run into the same scaling limits. That is we will be unable to speed up each execution unit any further, or the primary work we give the GPU will not be able to be split into more threads and accomplish useful work.


Good point about the workload. On the other hand, typical GPU tasks seem to be scalable basically ad-infinitum as graphics moved from fixed-function pipelines to per-pixel shaders to raytracing and path tracing.

So maybe GPUs still have some room until they run into the same problem as CPUs.


Multi-GPU nodes with fast coherent interconnects exist.


> We've known about this for a long time. Everyone expected it to happen.

Absolutely. The 2006 "A View from Berkeley" is still a great paper [1]. And we still have a long ways to go on this recommendation:

> To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism

We are still stuck in the winner-take-all mindset when it comes to software development.

[1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...


> GPU power is skyrocketing

Not quite. NVidia's entry level price/performance has not improved much since 2016. What's skyrocketing is the price of the top of the line models.


When the Voodoo 2 card launched it was $299. Adjusted for inflation that would be about $550. The latest 4090RTX is RRP of $1599.

High end is now becoming a case of throwing us much money at the problem as possible.


Well the main problem is resistance isn't it? Most of the power "used" is to get electrons to flow fast enough for the logic gates to settle for a specific clock frequency and the resistive losses to heat.

The only real way forward that isn't a temporary workaround seems finding a new type of semiconductor that has lower overall resistance than silicon. Whoever figures out how to dope graphene and produce wafers without defects will probably make trillions.


tl;dr New materials can help, but "resistive losses" aren't really the driving factor.

The energy is a mix of leakage current and active current. Leakage current can be thought of as resistance - it's how much current flows through a transistor that's off. This can be better based on the material, but gets harder with smaller transistors. (Thinking about quantum tunneling as a resistance is good to get intuition, but not good enough to help solve the problem. A material with a lower bulk resistivity will not help here.)

Active current is based on capacitance. Each FET has a little capacitor that needs to be charged and discharged every time the logic is switched - that adds up. Lowering the capacitance of each FET would reduce the energy required to switch it, but generally comes with bad tradeoffs. High-k dielectrics increase the capacitance, all other things being equal. But all other things are not equal, and they are used to create better performing FETs with lower power leakage.


I thought leakage current would be the "DC" loss that is independent of frequency, like we had in old bipolar logic. Isn't it fair to characterize the cmos/fet switching losses as resistance to moving the charges around?

I understand leakage will go up if we increase voltages to support higher switching speeds, but aren't there still a lot of losses that happen with logic transitions and reduce when the states are stable, even if voltages are held constant?

I realize it we can't move charges around for free, but in some fantasy superconducting-fet logic circuit, wouldn't the power consumption be reduced? I.e. much of the waste is resistive losses while charging and discharging those gates.


> Isn't it fair to characterize the cmos/fet switching losses as resistance to moving the charges around?

Not really. It makes more sense to think about it as filling and emptying capacitors. You are charging the gate capacitance up to the supply voltage, then dumping that charge to discharge the gate to 0 again. The energy of each capacitance that gets charged and dumped is CV^2/2, which happens for each logic transition.

> I realize it we can't move charges around for free, but in some fantasy superconducting-fet logic circuit, wouldn't the power consumption be reduced?

If there was no resistance when distributing charge, it would help a bit, but not enough to change the clock frequency by more than 20%, assuming that the fantasy superconducting-fet had normal leakage and gate capacitance.


So the charge is work and the discharge is waste?

I guess I am entertaining the idea of an idealized Maxwell-demon CMOS circuit, if we could bounce the charge between gates with very little work to just pump the charge back and forth.


That's a reasonable way to think about it - you take energy from the supply voltage to charge the gate capacitor when the logic line goes high, then dump it when the logic line goes low.

If you had a lossless bidirectional voltage converter circuit for each gate capacitance, then you could charge the capacitor from the supply and discharge it back into the supply, removing any switching losses.


They're both waste. Charging a capacitor to 1 volt means your average input voltage is .5 and half your energy goes to heat. Discharging to a ground line wastes the other half.

As the sibling comment says, you would need voltage converters running both ways to avoid this waste.


I think two major revolutions would be optical and reversible computing. The former would significantly shrink the heat generated which is a huge bottleneck but is very hard to build generic computing out of and expensive. The latter would basically result in computing obtaining a new theoretical lower bound on energy required but is purely research with no known approaches for actually building the things.

Asynchronous clockless designs might also drastically cut the power budget but those have failed to find adoption for some reason.


Reversible computing needs a place to store waste entropy, ie a rather large memory that's initialized to known values and filled with junk as the computation runs. Venting entropy out of a system is not reversible.

Clockless designs mean that you compute readiness information on the fly instead of having it precomputed at design time. This additional run-time computation is not free, and tooling for clocked designs is good enough that the extra slack they need is often cheaper.


Clockless designs did find their use, just not for the entire chips. Certain parts of modern CPUs are asynchronous.


Yeah, I'm just a bit surprised it didn't go further. Do you know what the reasons were that they couldn't make the entire thing clockless?


It makes reasoning about behavior very, very difficult. There is no tooling support for it.


I don't think clockless makes anything easier


What parts are clockless?


From what I understand while these two do contribute a lot to power usage, they don't really contribute that much to heating by themselves? Leakage should happen all the same in a processor that's completely idle and those typically don't heat up much. For higher clock speeds specifically I still don't see how lower resistance isn't key.


> Leakage should happen all the same

Modern processors are very careful about this, and actively turn off the supply voltage to large parts of the die to prevent extra leakage current. The funny-but-appropriate name for this is "dark silicon" https://en.wikipedia.org/wiki/Dark_silicon


> at 8 nm technology nodes, the amount of dark silicon may reach up to 50–80%

Damn TIL, I never would've expected that. But I guess it makes sense to use a few of the older, larger transistors that don't leak as much to power off a section of the smaller leaky ones while they're not performing any operations.


Hey, keep in mind that things like large caches, neural cores, SIMD units, DSP circuitry all count as dark silicon. The proliferation of these kind of components was predicted 15-20 years ago as a necessary response to the Dennard Scaling breakdown.


>while these two do contribute a lot to power usage, they don't really contribute that much to heating

Those are the same thing. Or at least close enough as makes no practical difference. Only an extremely tiny fraction of the power used but a CPU is becoming anything other than heat.


Then we actually agree? You don't get heating without resistance, ergo resistance is the main problem. MRIs don't have any problems sending a thousand amps through their coils.


You're right that the resistance is where the heat is dissipated, but lowering the resistance does not actually change the amount of heat. Transistor switching can be modeled as a step input to an RC circuit [1]. If you integrate the power through the resistor to infinity, you'll see that the value of the resistor drops out.

Intuitively, you might think of it like this: to charge a capacitor (or transistor) up to a certain voltage, you need a fixed number of electrons. That number of electrons will always pass through the resistor and generate heat based on their energy. Even if you change the resistor value, its still the same number of electrons, and the same amount of energy.

What does change with resistance, though, is the time over which the power is dissipated. In practice, you have to make sure the resistors are small enough such that you can achieve your desired clock speed.

There are actual resistive losses too, but they're mainly related to power delivery.

[1] https://en.wikipedia.org/wiki/RC_circuit#Time-domain_conside...


Ok but again, why do we need the resistors at all? That's what limits the speed at which the capacitor can discharge, so with theoretically 0 resistance you'd get immediate discharge and could go to infinite frequencies, or more realistically as far as the speed of electrons allows for consistent gate switching.

To add a bit of troll physics here (but I'm told computers using this sort of principle actually exist), why not then channel those electrons to a boost converter that pipes them back into VCC, recycling most of the current? Theoretically a 99% power usage improvement, minus what the converter loses to heat, and that can be as low as 10%.


It's not like the resistors are a component that is explicitly added. Every conductor has some resistance - the tiny wires that connect transistors together, the transistor itself, etc. Ideally, yes, you want those resistances to be as low as possible. But in practice there are design trade offs that happen if you do so; it's a balancing act. As the other commenter mentioned, there are no superconducting semiconductors, so the transistor is out. There has been some research into super conductors for the wires, but for the time being there is nothing that's easily integrated into existing manufacturing processes.

Re: channeling electrons - what you've described doesn't quite make sense. Fundamentally, if you're taking an electron at ground or 0V potential, and changing it's potential to VCC, it requires energy that comes from somewhere. The battery (or power supply) is doing exactly that. As the electrons flow back to the ground, the battery "recharges" them up to VCC potential.

What you can do, though, is put circuits in series between supply and ground. That way the electrons flow through the "top" circuit, do their thing, then flow through the "bottom" circuit. There's no free lunch though, as the voltage across each circuit will be reduced. Nonetheless, this is a common technique for low power analog circuits, and one I've used in the past. It's just not practical or worth it in digital circuits like a CPU.


There's no such thing as superconducting semiconductors.


When talking about resistance and materials, it's also important to note that silicon has relatively low optimal operating temperatures compared to some of the other semiconductors available. This limits the amount of voltage you can pump into it (because the resistance mean higher V leads to heat), and voltage correlates with clock frequency. GaN has already seen success in chargers, and silicon carbide is another promising material. We can't achieve the low level of defects needed for small process nodes yet, though.

Disclaimer: I'm not a material scientist, so this is probably only partly correct.


It helps that those services are a great recurring revenue stream, too.


Well, and photonics, quantum computing, etc. But they're not there yet.


There were multiple mentions in this thread contrasting GPU with CPU... I think, there's some conceptual gap here. GPUs are made of the same thing as CPUs, they have all the same problems CPUs have. They emphasize vectorized operations, and some less important (for this thread) stuff, like video encoding / decoding, but, by and large, the "PU" part is not accidentally the same in both acronyms.

The difference comes from usage. CPUs are shared by processes and threads that are designed to be unaware of each other, or to be even hostile. At the same time, a lot of programs are built in such a way that they don't exploit the parallelism available to them through CPU, or, even if they do, they do it in a very clumsy way (through a bunch of wrappers with their own limitations).

To contrast this, GPU programs typically use the whole GPU at once, and are written with parallelism in mind, with little to no wrappers.

Similarly, because the basic unit of CPU usage is a process, and the model of using CPUs is that processes aren't allowed to know about each other by default, the memory use becomes more involved, inter-process communication becomes more involved, permissions, access to network etc. -- all this complicates and slows down programs which want to use CPUs.

But, if, somehow, there was an OS that could use GPU to run processes on it, use VRAM for code / data of those processes etc. -- we'd have the same problems.


> GPU programs typically use the whole GPU at once

I don't think this is really true anymore? I mean, on a composited desktop, pretty much any UI app is a "GPU program". If you run a video game and it's not full screen, it's sharing the GPU (and it's increasingly common for "full screen" to actually mean a borderless window, too). Video players offload decoding. And then there's all the stuff that's using GPU to accelerate generic compute.

And once you have such sharing, it's just as adversarial as processes sharing CPU, when it comes to resource allocation, security etc.


Indeed, GPUs are not much better. For example, the NVIDIA H100 SXM has roughly the same price per transistor as the A100 SXM. The gains between generations were roughly equally split between (1) better chip design, (2) higher clock speed / power consumption, and (3) more transistors. Cost per transistor did not meaningfully improve.


GPUs have a much higher typical utilization than CPUs, I think that's what the previous poster wanted to say.


In fact the CPU stagnation is worse. In contrast to the shrinking chip size and higher frequencies of past scaling laws, CPU core count does not easily translate to performance gains. It needs specialized and careful designed software to tap the capability.

In fact if CPU core count did translate more easily to performance gains I think already with the existing CPU's we'd have a fairly signficant one-time boost.

Maybe somebody has statistical survey of how much of the existing deployed CPU core count is typically used?


> It needs specialized and careful designed software to tap the capability.

I've been suggesting engineers get more cores of lower speed to gain insight on what will be performant a few years down the road since I saw my first Xeon Phi.

It's been a while since clock speeds got higher (IBM has been pushing 5GHz in their highest end for the past couple years now and it doesn't seem likely they'll cross 6 anytime soon), but we get more cores every year. We now have 4-core entry-level machines and 2-core/4-thread ones are the bottom of the barrel, with a decent one being 8-core. Ampère just announced a 192-core server beast.

And then we have another thing: performance for most users has been "good enough" for the past couple decades. I haven't gotten a new computer just because it had a faster CPU since the early 2000's - they usually turn to dust well before they become too slow to use. My wife will need to upgrade her Macbook soon-ish for regulatory reasons (when Apple EOLs and stops patching macOS 12) and her laptop is still going strong. Considering that, there is little advantage in making all but the most demanding software more parallel.

This leaves the high-end, the stuff that needs a POWER10 or a Telum to run at acceptable speeds, and the cloud vendors, who'd kill to be able to serve 1% more VMs per kilowatt because 1% of their revenue is the GDP of a small country.


> and the cloud vendors, who'd kill to be able to serve 1% more VMs per kilowatt

If Google's only expense were electricity, and profit margin was 50%, then saving 1% on power bill would increase profits by 1%.

I skimmed the Alphabet annual report: saw $260 revenue, $80 profit, $110 operating expenses (electricity and staff). Say $10 on power, then 1% is $0.1 - improve profits a bit over 0.1%.

Anyone know how many $/year Google spends on power?


>> Maybe somebody has statistical survey of how much of the existing deployed CPU core count is typically used?

My guess is very few cores are used on average. I did some testing with Solvespace to see which build options contributed most to performance:

https://github.com/solvespace/solvespace/issues/972

Obviously using OpenMP for multi-core was the big win. But what's not shown is that in typical usage (not the test I ran) if you're dragging some geometry around it will use all cores (in my case 4 cores / 8 threads) at about 50 percent utilization. That percentage probably drops as more cores are thrown at it due to Amdahl's Law. In other words, throwing double the cores at it will give a good boost to a lot of code that is already taking less than half the time (wall clock time, not CPU time).

We added OpenMP to a number of functions for significant performance gains. And in fact, any remining single-thread operation that gets the parallel treatment is likely to have a significant impact on overall performance since that is where most of the time is spent now. At this point we're more focused on features and bugs.

Algorithmic improvements are possible and I'd like to do those in the future, but they are much harder to do than sprinkling some #pragmas around critical loops. That will improve the scalability though, where multithreading really did not.


I'm not convinced that this is the case.

A lot of computation-heavy workloads do in fact scale rather well with an increased core count. That's why GPUs are essentially using thousands of cores, after all: nontrivial computation virtually always implies a sizeable amount of data, and a lot of data usually implies that you can split it up into multiple sections to compute in parallel.

A lot of desktop code is still single-core, that is true. However desktop CPUs are idle >90% of the time, only waking up to do a small burst of computation every once in a while. You're not going to notice some event loop finishing in 0.8ms instead of 1ms. Even then it'll probably be running multiple tasks from multiple processes in parallel, making good use of the available cores.

CPUs have simply gotten fast enough that they aren't a bottleneck for desktop use anymore!


I was looking for this comment and I think you're exactly right.

Over the years, Moore's Law became a household term for computer performance doubling every couple of years. Under that definition, Moore's Law died in 2005 with Dennard Scaling so for most intents and purposes, Moore's Law has been dead for a long time.

It only held under the more restrictive definition of performance for tasks that were able to be parallelized perfectly, but even that has now been broken.

You could also argue that Moore's Law died in 2005 because the term CPU used to refer to what we now know as a CPU 'core' and the term was redefined.

Ultimately, what matters is that the performance the end user experiences hasn't been doubling every 2 years since 2005.


> CPU core count does not easily translate to performance gains.

In theory, performance scales logistically with the number of parallel processors (Amdahl's law).

In practice, the limit is (and has always been) memory and i/o. That's why Apple silicon kicks everyone's ass.

If we want faster computers, the biggest gains are not to be found in making processors do more work. It's in designing systems (not just CPUs) that don't let the CPU wait around to do work.


> That's why Apple silicon kicks everyone's ass.

??? But it doesn't: https://browser.geekbench.com/processor-benchmarks https://browser.geekbench.com/mac-benchmarks


Doesn’t Geekbench overwhelmingly measure raw CPU performance, whereas GP was talking about overall system performance? Isn’t this chalk and cheese?


Well, what other benchmark results say otherwise?


*kicks ass per-watt


The parent was talking about unified memory.


> In practice, the limit is (and has always been) memory and i/o

Absolutely not. In practise the limit is (1) how many cores are actually _used_ by programs and (2) how much work is put into making anything fast at all, ever. We're using web frontends powered by python backends over a network. The vast majority of programs use nowhere near the resoures available to them.


Those backends are spending most of their time waiting to do work, which is my point.


But even the amount of time they're doing "work" they're mostly managing refcounts and walking MRO chains and formatting strings and doing hashtable lookups to find local variables at runtime. Python code is 20-40x slower than C code. That other 19/20ths of the time is the bottleneck, not the memory bandwidth of the actual work. Same with your electron frontend.


CPU core count does translate to performance gains but the popular software architecture idioms most people use are incapable of taking advantage of large numbers of cores. The gap between the performance you typically see and what is possible with proper software architecture and performance engineering is orders of magnitude in scale. A lot of performance and scalability is left on the table.

We've known how to scale software on large silicon for a long time, but as an industry we mostly can't be bothered (or lack the skills) to do it.


It will hardly change as long as scripting languages keep being used for full blown applications.


The simple reason is devs are more expensive than servers for most applications.


multi-core architectures have been optimized for web servers i.e. tons of clients hitting relatively simple backend processes. also, my day-to-day gui workstation (ancient xeon mac) gets far more core usage than my dev server (threadripper) which is honestly kind of a waste of money, but it's shiny so i bought it anyway. it should last me a decade+ just like my gui workstation has.

now that single-core GPU/CPU/TPU/whatever performance is back on the front burner i think we'll see some horsepower and compiler improvements over the next few years. luckily the i/o problem has made great strides in the meantime so network/memory/storage will be there to support it, unlike in the past. ecc ram is also plummeting in cost, so that's good.


> It needs specialized and careful designed software to tap the capability.

I wouldn't exactly call Golang[1] "specialized", but it does make multiprocessing easier than most languages.

1.Or Erlang or Elixir


I only use go through hugo but its fast enough I never checked how many cores it uses :-)


It does require software taking advantage, but this was a chicken-egg problem caused by Intel's stagnation. When most CPUs had only 4 cores, developers had an easy excuse: two of the cores were "fake" HT cores so they didn't count, they'd use one core, and the other core was for the OS or other apps, so there's basically no point complicating software for so little gain.

Only recently AMD has opened the floodgates and we got consumer systems with 16 cores, where finally it's really hard to find an excuse to leave 15 cores idle, and there's so much raw power that even suboptimal scaling can give a big performance boost.

Also Rust became a thing, and it makes much easier to write reliable multi-core software, so I'm optimistic about software catching up.


I mean Microsoft did just brag about cutting the Teams launch time to 9 seconds. Progress, baby!


Imagine if something like a digital camera booted up in 9 seconds.


Or it needs multiple processes running at the same time. Notably running lots of C++ compilers simultaneously works great and really likes the high core count machines.


I'm not convinced this is the right criteria to compare these processors (core count and purchase cost).

When looking at these high core count processors, the typical use case is for a server in a data centre, and these sorts of applications run 24/7 and the cost of power is a massive part of the TCO. I think you have to address power per gflop when evaluating performance for these parts, as this is the criteria they were designed against.

I think the processors are costed in consideration of the TCO of a 2U dual socket machine with a 2-3 year expected lifespan. They will be designed and costed to show year on year improvements.

Oh, and i'm not sure inflation was included as it will be relevant over the timescales involved.


The cores/money comparison doesn't looks like it accounts for inflation.


The cores/money comparison doesn't looks like it accounts for inflation.*

more to the point, the comparison also represents the period of time in which AMD, the one-time "lesser" player nipping at monopolist Intel's heels by competing on price, became the technology leader and started commanding a premium for their products. Meanwhile, Intel has not been forced to cede its position in the market, their existing contracts and business model is based on premium products, not cost leadership. It will take awhile for this to sort out.


Indeed. I was also wondering why people would buy newer chips if they don’t show performance increases per dollar.


There are overheads in server workloads that scale with the number of machines (network traffic, serializing/deserializing requests). There are also fixed costs per server that don't scale with core count, or at least scale sublinearly (storage, physical data center space, motherboard, ease of maintenance). So running 10 machines with 100 cores can be cheaper and more performant than running 1,000 machines with 1 core even if $/core is higher. And of course individual cores can be beefier: wider SIMD units, application-specific extensions like bfloat support for ML workloads, etc.

Of course Moore's law is slowing down, but cores/$ is an extremely silly metric to use


I think the main reason people get a new CPU, regardless of the performance, is because the device that it came in has gotten too old (wear-and tear: screen, batteries, etc).


I'm pretty sure that when you adjust for inflation, you get more performance per dollar, but I'd need to do math and I don't have the numbers from my head.

Does Google Sheets provide a "inflation-adjusted dollar" function?


I’m not sure really.

But anyway, I’m not sure if it makes sense to expect performance/$ to always increase anyway. I mean, I know this started out by talking about multicore, but think about single threaded performance. They’ve already grabbed all the low hanging fruit, the challenge now is finding increasingly hard to hunt down tweaks… a small improvement might require massive engineering effort.


Intel may be able to squeeze a little more juice out of x86 by dropping backwards compatibility. This is something I suggested they could have done for Apple because Macs were never supposed to boot MS-DOS anyway.

A simple and regular ISA is what makes ARM easier to implement, which also means lower power because of fewer transistors doing thankless work like decoding instructions and reordering them.


Performace/$ is still going down when adjusting for inflation. Performance/W goes down even more.


Because the vendor will no longer give the the old chips? That an lack of support around security.


This analysis is wrong because it is focusing on a limited number of high end chips.

You can buy CPUs that cost a fraction of any of those listed that will absolutely demolish even the best chips from 6 years ago today. All while consuming drastically less power for that performance as well.

Even mobile chips over 2 years old are within margin of error performance distance to those entry level Naples chips listed, at a fraction of the cost and power consumption.

I would not call that stagnation.


Thanks, I was hoping someone would comment as to why the article might be wrong. I also wonder what the trends look like in terms of performance per watt?


It's hard to say without actually testing the hardware with a power meter, since throttling plays such a big role and power budgets advertised are often just a single figure approximation of a much more complicated formula.

I run a home server powered by a Ryzen 5800H, nominally a 45W part, but I've seen it maintain a power draw far higher than that for hours under sustained load.


Ugh, you know it really wouldn’t be so bad if they would just go back to selling cpus like they're meant to be kept around for more than a year. The thing that has me really ticked off at Intel right now is the i915 sriov capabilities introduced in 11th gen which were meant to supersede GVT-G. I didn’t even know GVT-G was a thing until it was already gone. They didn’t even make a driver that supports 11th gen and 12th is apparently supported in some capacity but I’m not understanding it very well myself. I damn sure wouldn’t buy a 12th gen though if its a thing you think you might actually care about. If 11th gen is any example of how 12th will turn out, any development will be left up to one person who miraculously understands i915 well enough to develop for it, doing for free and doing a good job but not well enough to keep up with linus/linux~HEAD; needless to say if 11th gen support for i915 sriov does ever get merged the cheap low quality construction of the hardware will probably be starting to break down, seems like most of it is by the time you unbox it anyway these days :(


There’s actually 2, theres the intel lts one, which i guess they did do something it just never amounted to anything, explained here:

https://github.com/intel/linux-intel-lts/issues/33

The ongoing development:

https://github.com/strongtz/i915-sriov-dkms


I think you should factor in inflation. Inflation has been significant recently.

Also you are not comparing to Intel's cost per core which would show that this pricing issue is not new. I think you just didn't notice it before.


Inflation has only been significant in the last 2 years max.


The answer to this might be more application specific accelerators and 3D stacking. You can't afford to have all of a chip switched on at once because of the power consumption and dissipation, so you build more optimised accelerators and keep flipping between them as you encounter different pieces of code that may benefit from each accelerator. Only a fraction of the device is ever in use at once. You 3D stack the chips to get more transistors in the same space. NAND flash chips are already 3D stacked, often with 100 layers or more.


It would be cool if chiplets got to the point where, at least, an OEM… someone Dell sized… could actually differentiate themselves by mix-and-matching a group of accelerators. Bringing back consumer-visible differentiation in CPUs (other than Apple against the world) would be nice for the market I think.


The industry term for the idea that most of your chip is switched off at any given time is “dark silicon”.


What about silicon that's perpetually switched off because of defects and binning (Like apple's 7 core graphics offerings and so on), that's an even darker silicon?


> Regrettably, when considering cost per core, this impressive trend appears to have stalled, ushering in an era of CPU stagnation.

In this case, don't let the author see $/GFLOPS for the last 10 years of GPUs!



I found the single core SPECInt figures for the last couple of decades very interesting. The curve flattens. Apologies for link to the bird site, but I’m mobile and it’s the fastest way I can find it, I think some of your will enjoy the diagram.

https://twitter.com/nickdothutton/status/1194978743250538496...


What seemingly has stagnated even more is our ability to write fast and responsive software. We have oceans of compute power, blazingly fast I/O, and yet, it's expected and accepted for mundane tasks to take hundreds of milliseconds to complete.


I'd say the desktop CPU situation is doing pretty alright. My 10400F is feeling very modest compared to pretty much everything in the 13xxx and ryzen 7000 lineup. Frustrating given how deadend the chipset I got was.



Zen 4C is somewhat delayed but should be an interesting shift on these great numbers. It's a significsntly smaller core, first for servers, then latter big/little consumer parts.

Maybe the IPC or GHz really will be significantly lower, but I tend to think things like cache size will be the biggest hit, and that cache size change wouldn't show up in these graphs. Essentially, same number of transistors, but more compute less cache is my guess. But perhaps the cores really are smaller & narrower & the IPC * GHz rating doesnt budge much!


It isn't just CPUs either. For GPUs, a 2018 analysis [1] estimated that FLOPS per dollar only doubled every 3.9 years. With recent disappointing improvements in new graphics card price/performance, I expect this number got even worse now.

[1] http://mediangroup.org/gpu.html


At least now people will start to focus on how bloated and slow software has become in the meantime. And stop referring to it as “tech”.


I (unfortunately) doubt it. I think the reality with our current situation is that only hobbyists and open source people care enough about that to do anything about it. Everyone else is doing Scrum so hard that they can't think of much outside the current sprint, and any "pre-mature optimization" is evil and must be avoided. The result will be more of the same bloat. Competition won't help because everybody is using the same bloated foundations and nobody will invest more than a few days in the foundation because it's not "product work."

The best we can hope for I think is that open source will create frameworks/foundations on top of which people can then try to build. Elixir Phoenix has been that to some extent, basically taking the rails philosophy but making it super light and fast (my Phoenix APIs run with 40MiB of memory and response times ~1ms). Maybe those sorts of advancements can save us, but I can't think of a way to address the browser that way and realistically right now the browser is a huge area of the bloat. A ton of code that runs in the browser is terribly optimized, but even the base is quite big.


That's exponential era practices. But we're hitting the top of the S curve. Believing that exponential era practices will continue indefinitely into the coming linear/level era is as naive as believing exponential growth continues forever.

The real question is what happens first: change in software to adapt to slow growing compute or change in architecture to revitalize Moore's law.


Oh, I don't think it's quite that grim. When development teams can no longer easily spend their way out of their inefficient code, being able to write high performance code becomes a competitive advantage. I'd even go so far as to say that individual developers looking to get ahead in their careers should pay less attention to flashy tech fads and more at foundational stuff like how to diagnose performance problems and write efficient code.


> Everyone else is doing Scrum so hard that they can't think of much outside the current sprint

This is a gross oversimplification. While there’re inefficiencies and process abuses, this doesn’t mean nobody cares about speed and resources. It might look that way to purists who’re focused only on tech part of the businesses.


Processors used to spend most of their time sorting, after a period of wasteful bullshit they may return to hotspot's dominating again in the form of AI inference.


Won't we have dedicated analog/optical/whatever inference accelerators by that point as well?


Compute time is cheaper than man hours.


Its not like biological creatures are highly optimized.


Eh, for the amount of work your brain accomplishes, it is insanely hyper optimized at around 20 watts. We don't have human level AI/image processing quite yet, but it would take hundreds of thousands of watts to accomplish the same thing at this time.


They have numerous problems. They are wrong a lot and die constantly as prime examples.


Containers killed the performance star.


I don't see how this could be true. Containers are just processes with some extra permissions applied.


The slowdown of x86 processor progress was caused by the very slow adoption by Intel of new fab processes. 14nm CPUs came out in 2014, 10nm mobile only (Tiger Lake) came out in the end of 2020. Almost three years later still no commercial CPUs by Intel on 7nm.

They used to be a lot faster, 22nm CPUs (Ivy Bridge) came out in 2012 - so around 2 years to go from 22 to 14.


Not really. Clock speeds for CPUs have stagnated because they hit the power wall back in the P4 days. Part of what enabled the relentless clock increased from the 1970s until the early 2000s is that each generation of CPU ran at a substantially lower voltage than the previous generation while being able to use more power. Going from 5 volts to 3.3 volts is a substantial power saving that gives you headroom to increase clock speed, but going from 1.1V to 1.05V is not nearly as much of an improvement. Once CPUs started dissipating 100-150 watts of power and voltages only drop by a few percent each generation, there's simply no power budget left to increase clocks. Current cores can run at upwards of 7 GHz (just look at the overclocking records) with liquid nitrogen cooling to dump the power increase, but virtually nobody will use such a system in the real world.

The reason we see increasing core counts is that 2 cores consume roughly 2x the power of 1 core (well, slightly more). Doubling clock speed from 5 GHz to 10 GHz would cost way more than 2x the power. Furthermore, pumping that much heat out of a sub-100mm^2 die gets harder and harder. You can have more transistors in a small space, but you simply can't have them change state much more than the previous generation did.



I'm still impressed by 64 core Genoa. It is 1.7x the total performance (cores * clock rate * IPC) compared to 64 core Milan in the table presented. That's not even two years apart.

I know the die sizes didn't decrease much, around 10% or so. And R&D increase is surely surpassing inflation.


Compute power continues to increase as evidenced by faster accelerators/gpus and top500 flop counts.


We are off the exponential curve until a completely new method is discovered, which of course might not exist.

Maybe our great^n grandchildren will then have lives very similar to the great^n+1 grandchildren thereafter, just like the older days!


Well, strong AI seems not far off, and AI scales, via recursive self-improvement, easily to the boundary of the physical limits. Dyson spheres, Jupiter brains. There doesn't seem to be a major plateau on the horizon.


What’s the power use vs performance look like over the same period?


"When you can't scale up, scale out."

GPUs have strong potential for improvement, and moving workloads to them helps on multiple fronts: performance, cost, power consumption.


If your workload is the embarrassingly parallel type that GPUs are designed for.


This has to be coupled with 'The Great Software Decay'. I hope we will have 'real' innovation in Software Engineering where people take writing performant code seriously and many Software Engineers actually know what a compiler does and what the CPU does etc.

The frameworks have been helpful but at the same time -- rounding buttons is not software engineering.


The problem is that many algorithms simply cannot be parallelised well and that ever since the 90s the iron is cheap whereas the programmer's time is expensive.

So instead of spending weeks or even months on trying to squeeze the last bit of performance out of an application that's "good enough" performance-wise, developers can use that time to roll out features or fix bugs instead, i.e. generating value for their customers.

It's simply a question of economics.


IPC data seems a bit sus


Even worse: IPC data is pretty much useless when it comes to comparing CPU performance!

How is it going to account for something like the introduction of AVX512, doubling the data throughput per instruction? What about increased cache greatly reducing the number of instructions waiting for memory access, or a faster memory controller? How would you even begin to quantify a new PCIe generation doubling SSD bandwidth, or a better core interconnect reducing NUMA penalties?

IPC is basically just a marketing number with today's CPUs. If you want to get a proper comparison, use a real-world benchmark.


It's the average IPC improvement from the AMD marketing slides - Zen 2 over Zen, 15%, Zen 3 over Zen 2, 19% and Zen 4 over Zen 3 at 13%, so Zen 3 over Zen is 37% and Zen 4 over Zen is 55% (maybe a typo in the article)


Zen 1 was terrible and is not a useful baseline. It was obsolete at launch. Nobody bought them except companies who were paid to take them and agree to be in a press release. It was literally half the speed of Skylake on server workloads.


>> Zen 1 was terrible and is not a useful baseline.

First off, it's a perfect baseline when comparing AMD chips since that time. Zen 1 was similar to Intel performance-wise, winning some benchmarks and losing some.

Second, the Raven Ridge (Zen 1+) APUs were IMHO excellent performance for the price at the time - even against Intel. I have not felt the need to build a new system since the Mellori_ITX:

https://github.com/phkahler/mellori_ITX


Zen 1 was slightly slower per core, but had twice the amount of cores that Intel was willing to sell for desktop platforms! Kind of a big deal for use cases such as compiling C++ code...


We might need to stop writing JavaScript? Please no :'(


Please yes. >:)


Up until this year, CPU was never my limiting factor. We always used multithreaded or GPUs.

This year has seen a need for a decent CPU for occasions where we must single thread. Both Local AI and python development.

For so long, we never maxed out our CPU.


Just don't get stuck in Python.


The Great x86 CPU Stagnation? ARM seems to have made some great leaps forward in recent years.


I don't have a feeling ARM offerings are significantly cheaper than x86. There really isn't much magic, a transistor costs the same to manufacture under a given process, regardless of the ISA.

In that, RISC-V might get some edge, because ARM licensing is expensive, but I don't think licensing is a significant cost for the x86 crowd.

OTOH, the server ARM people are really pushing it: https://www.semianalysis.com/p/sound-the-siyrn-ampereone-192...

But Altra's ain't cheap.


Top AMD consumer x86 CPU over 4.5 generations:

https://www.cpubenchmark.net/compare/2966vs3238vs3598vs3862v...

I think the article is just about server CPUs getting better but the price is keeping them from being significantly better for the price. Of course, in these markets, any increase in power is justified such that the buyers are not price sensitive.


not really, there have been some systems built on top of ARM chips that have been impressive, but the chip hardware tech hasn't changed much differently than their x86 counterparts


ARM has seen increased adoption, but the chips themselves aren't much different than 4-5 years ago.


I’m ignorant about this. Can a Qualcomm or Samsung chip match Intel’s raw power?


The leading Arm chip (by apple) is arguably the best in class.


Much of Apple's success in the area seems to stem from the fact that they're simply buying other parties out of the latest manufacturing processes, e.g. https://appleinsider.com/articles/23/05/15/apple-has-a-stran...

So if others want to compete, they'll always be a few years behind, since the fab capacity is reserved to Apple. So any competitor either has to magically improve the architecture dramatically - which they can't, since that would require an architectural license which Apple has, but most others don't - or find a fab that can compete with TSMC's latest tech both in terms of price and available volume.


Apple's chips are very power efficent, yes, but the article and parent are talking/asking about raw power. Despite all the Intel hate the past 5 years, I don't think there's been even a moment where another company's leading chip has outperformed Intel's in benchmarks.

See for example https://www.cpubenchmark.net/compare/4922vs5022vs5008vs5189/...


>125W

And that is a lie, to achieve those scores it turbos and consumes 300W+ at turbo.

Today, right now, is a moment where AMD's enterprise product are outperforming Intel's in benchmarks.

Except for a very small set of specific use cases, I think anyone recommending Xeon for enterprise solutions is professionally negligent.


The M2 Pro isn't doing too bad when you consider you're comparing a laptop chip with 14+ hour battery life against Intel's latest and greatest, high-TDP desktop CPU with twice as many cores. On a performance-per-core basis it's not even far behind.


If the TDP is an accurate measure, then the M1 appears to be about as efficient as 12th gen Intel: https://www.cpubenchmark.net/power_performance.html#all-cpu


I wonder if there's some shenanigans going on there. Max TDP in an M1 is surely far lower than the max TDP of an i7-1255U, which it outperforms. Most M1 systems don't even have fans but can perform at max performance for extended times. U-series i7s can also be run fan-less, but performance will be compromised?


Umm, I added a readily available consumer CPU from a different company to your comparison chart.

https://www.cpubenchmark.net/compare/5022vs5008vs5189vs5031/...


I'm well aware of Apple’s success since I’m surrounded by their devices at home. It seems to me it can’t be attributed only to ARM. What I wonder is where’re someone can match Intel on Intel’s playing field, like supplying ARM chips to laptop manufacturers that are better than Intel.


Is there anything ISA-related in this ? Hardly

Using the last ASML's technology is a big part of the situation Using on-chip memory is another


Coincidentally Ampere One was announced today and it looks pretty mediocre.


What looks mediocre about it? They're making BIG claims about density and power usage at scale


If they doubled performance and doubled power that's not good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: