Indeed, why would they not call themselves NvidAI to begin with. This company has twice already been super lucky to have their products used for the wrong thing (given GPUs were created to accelerated graphics, not mining or inference)
3 times, if you count the physics GPGPU boom that Nvidia rode before cryptocurrencies.
And other than maybe the crypto stuff, luck had nothing to do with it. Nvidia was ready to support these other use cases because in a very real way they made them happen. Nvidia hardware is not particularly better for these workloads than competitors. The reason they are the $4.6T company is that all the foundational software was built on them. And the reason for that is that JHH invested heavily in supporting the development of that software, before anyone else realized there was a market there worth investing in. He made the call to make all future GPUs support CUDA in 2006, before there were heavy users.
I don't think the physics processing units were ever big. This was mostly just offloading some of their physics processes from the CPU to the GPU. It could be seen as a feature of GPUs for games, like ray-tracing acceleration.
That's not what I was referring to. I was talking about NV selling GPGPUs for HPC loads, starting with the Tesla generation. They were mostly used for CFD.
Ah, you're right. Thanks for the correction. But seems like they have applications far beyond CFD if they are what's put in the biggest supercomputers.
CFD is what 90+% of non-AI supercomputer time is spent on. Whether you are doing aerodynamic simulations for a new car chassis, weather forecasting, or testing nuclear weapons in silico, or any of the other of literally hundreds of interesting applications, the computers basically run the same code just with different data inputs.
I don't think it's luck. They invested in CUDA long before the AI hype.
They quietly (at first) developed general purpose accelerators for a specific type of parallel compute. It turns out there are more and more applications being discovered for those.
It looks a lot like visionary long term planning to me.
I find myself reaching for Jax more and more where you would have done numpy in the past. The performance difference is insane once you learn how to leverage this style of parallelization.
Are you able to share a bit, enough to explain to others doing similar work that this "Jax > numpy" aspect applies to what their work (and thus that they'd be well-off to learn enough Jax to make use of it themselves)?
A lot of this really is a drop in replacement for numpy that runs insanely fast on the GPU.
That said you do need to adapt to its constraints somewhat. Some things you can't do in the jitted functions, and some things need to be done differently.
For example, finding the most common value along some dimension in a matrix on the GPU is often best done by sorting along that dimension and taking a cumulative sum, which sort of blew my mind when I first learnt it.