> CUDA is the most direct and transparent way to work with the GPU
Yes, but it's still not direct and transparent enough. The libraries and drivers are closed.
> it doesn't have four vendors trying to pull it their way ending somewhere in the middle
Well, no, but it does have "marketing-ware", i.e. features introduced mostly to be able to say: "Oh, we have feature X" - even if the feature does not help performance.
Yes, but that does not bother me all that much, since they are tied to that specific piece of hardware. I'm more concerned with whether they work or not and unless I'm planning to audit them or improve on them what's in them does not normally bother me, I see the combination Card+Firmware as a single unit.
> Well, no, but it does have "marketing-ware", i.e. features introduced mostly to be able to say: "Oh, we have feature X" - even if the feature does not help performance.
I'm not aware of any such features other than a couple of 'shortcuts' which you could have basically provided yourself. Beyond that NVidia goes out of its way to ship highly performant libraries with their cards for all kids of ML purposes and that alone offsets any kind of bad feeling I have towards them for not open sourcing all of their software, which I personally believe they should do but which is their right to do or not to do. I treat them the same way I treat Apple: as a hardware manufacturer. If their software is useful (NVidia: yes, Apple: no) to me then I'll take it, if not I'll discard it.
I don’t know which features you’re talking about, but over the years, CUDA has received quite a bit of features where Nvidia was quite explicit that they were not for performance, but for ease of use. “If you want code to work with 90% performance, use this, if you want 100%, use the old way, but with significantly more developer pain.”
Out of curiosity, not direct enough for what? What do you need access to that you don’t have at the moment?
> features introduced mostly to be able to o say: “Oh we have feature X” - even if the feature does not help performance.
Which features are you referring to? Are you suggesting that features that make programming easier and features that users request must not be added? Does your opinion extend to all computing platforms and all vendors equally? Do you have any examples of a widely used platform/language/compiler/hardware that has no features outside of performance?
And what about the host-side library for interacting with the driver? And the Runtime API library? And the JIT compiler library? This seems more like a gimmick than actual adoption of a FOSS strategy.
Just to give an example of why open sourcing those things can be critical: Currently, if you compile a CUDA kernel dynamically, the NVRTC library prepends a boilerplate header. Now, I wouldn't mind much if it were a few lines, but - it's ~150K _lines_ of header! So you write a 4-line kernel, but compile 150K+4 lines... and I can't do anything about it. And note this is not a bug; if you want to remove that header, you may need to re-introduce some parts of it which are CUDA "intrinsics" but which the modified LLVM C++ frontend (which NVIDIA uses) does not know about. With a FOSS library, I _could_ do something about it.
> Out of curiosity, not direct enough for what? What do you need access to that you don’t have at the moment?
I can't even tell how may slots I have left in my CUDA stream (i.e. how many more items I can enqueue).
I can't access the module(s) in the primary context of a CUDA device.
Until CUDA 11.x, I couldn't get the driver handle of an apriori-compiled kernel.
etc.
> Which features are you referring to?
One example: Launching kernels from within other kernels.
> Are you suggesting that features that make programming easier and features that users request must not be added?
If you add a feature which, when used, causes a 10x drop in performance of your kernel, then it's usually simply not worth using, even if it's easy and convenient. We use GPUs for performance first and foremost, after all.
This feature exists? It’s news to me if so and I would be interested. Is it brand new? Can you link to the relevant documentation?
I’m pretty lost as to why this would represent something bad in your mind, even if it does exist. Is this what you’re saying causes a 10x drop in perf? CUDA has lots of high level scheduling control that is convenient and doesn’t overall affect perf by much but does reduce developer time. This is true of C++ generally and pretty much all computing platforms I can think of for CPU work. There are always features that are convenient but trade developer time for non-optimal performance. Squeezing every last cycle always requires loads more effort. I don’t see anything wrong with acknowledging that and offering optional faster-to-develop solutions alongside the harder full throttle options, like all platforms do. Framing this as a negative and a CUDA specific thing just doesn’t seem at all accurate.
Anyway I’d generally agree a 10x drop in perf is bad and reason to question convenience. What feature does that? I still don’t know what you’re referring to.
Yes, but it's still not direct and transparent enough. The libraries and drivers are closed.
> it doesn't have four vendors trying to pull it their way ending somewhere in the middle
Well, no, but it does have "marketing-ware", i.e. features introduced mostly to be able to say: "Oh, we have feature X" - even if the feature does not help performance.