Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The most complex circuit on the GPU would be the thing chops the incoming command stream, and turns it into something which matrix multiplicators can work.


I get the feeling you're only really thinking about machine learning style workloads. Your statement doesn't seem to take into account scatter/gather logic for memory traffic (including combine logic for uniforms), resolution of bank conflicts, sorting logic for making blend operations have in-order semantics, the fine rasterizer (which is called the "crown jewels of the hardware graphics pipeline" in an Nvidia paper), etc. More to the point, these are all things that CPUs don't have to deal with.

Conversely, there is a lot of logic on a modern CPU to extract parallelism from a single thread, stuff like register renaming, scoreboards for out of order execution, and highly sophisticated branch prediction units. I get the feeling this is the main stuff you're talking about. But this source of complexity does not dramatically outweigh the GPU-specific complexity I cited above.


Isn't the rasteriser "simply" a piece of code running on the GPU?


No, there is hardware for it, and it makes a big difference. Ballpark 2x, but it can be more or less depending on the details of the workload (ie shader complexity).

One way to get an empirical handle on this question is to write a rasterization pipeline entirely in software and run it in GPU compute. The classic Laine and Karras paper does exactly that:

https://research.nvidia.com/publication/high-performance-sof...

An intriguing thought experiment is to imagine a stripped-down, highly simplified GPU that is much more of a highly parallel CPU than a traditional graphics architecture. This is, to some extent, what Tim Sweeney was talking about (11 years ago now!) in his provocative talk "The end of the GPU roadmap". My personal sense is that such a thing would indeed be possible but would be a performance regression on the order of 2x, which would not fly in today's competitive world. But if one were trying to spin up a GPU effort from scratch (say, motivated by national independence more than cost/performance competitiveness), it would be an interesting place to start.


Intel will ship larrabee any day now... :)


The host interface has to be one of the simplest parts of the system, and I mean no disrespect to the fine engineers who work on that. Even the various internal task schedulers look more complex to me.

If you don't have insider's knowledge of how these things are made, I suggest using less certain language.


> The most complex circuit on the GPU would be the thing chops the incoming command stream

That's not true at all. I'd recommend reading up on current architectures, or avoid making such wild assumptions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: