>For tasks that tend to scale well with increased die area, which is often the case for GPUs as they're already focused on massively parallel tasks so laying down more parallel units is a realistic option, running a larger die at lower clocks is often notably more efficient in terms of performance per unit power.
I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.
(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)
Yeah the power consumption scales, to first order, with Vdd^2 (square of power supply voltage) but performance scales with Vdd. Though you cannot simply reduce the Vdd and clock rate and do more pipelining etc to gain back the performance. If you are willing to back off on performance a bit you can gain hugely on power. Plus thermal management of it is more manageable.
I should've considered this, I have an RTX A5000. It's a gigantic GA102 die (3090, 3080) that's underclocked to 230W, putting it at roughly 3070 throughput. That's ~15% less performance than a 3090 for a ~35% power reduction. Absolutely nonlinear savings there. Though some of that may have to do with power savings using GDDR6 over GDDR6X.
(I should mention that relative performance estimates are all over the place, by some metrics the A5000 is ~3070, by others it's ~3080.)