Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AMD Open-Source GPU Kernel Driver Above 5M Lines, Entire Linux Kernel at 34.8M (phoronix.com)
148 points by TangerineDream on Aug 31, 2023 | hide | past | favorite | 73 comments


The openbsd situation is even worse. over there the driver is bigger than the rest of the kernel.

Don't get me wrong I use the driver every day and AMD is definitely one of the good guys for making an open source driver and them who ported it are absolute heros. However.... Sometimes I wish AMD had tied down the isa to their cards a little better. Narrowed the interface if you would. because as it is the driver is so big because there is this combinatorial explosion of generated header files.

https://flak.tedunangst.com/post/watc


There is no business reason to restrict themselves on the ISA, and it would be make their hardware less performant compared to the competition which would not be so bound.


The competition is very much bound to a rather narrow ISA which is why CUDA is forward and backwards compatible whilst ROCm isn’t.

ROCm will be pointless until at least forward compatibility will be guaranteed by design.


Someone else already said it but it beats repeating: this is wrong

> The competition is very much bound to a rather narrow ISA which is why CUDA is forward and backwards compatible whilst ROCm isn’t.

PTX isn't an ISA and it goes through another translation step before it hits the real ISA (SASS), which does change internally. Though you are right that PTX as intermediate is part of the reason for CUDA's success in that it makes the code portable across cards.


How so? One of the ROCm layers is opencl which does runtime target compilation, and there is also the various intermediate language representations.

So, presumably much of this is just lack of sufficient resources to assure that all the different combinations are optimally being compiled down on all the independent HW devices.

This was always the OpenCL problem too back before everyone just started using CUDA rather than trying to use it. It worked in a lot of places but the higher level code needed to be adjusted for individual devices because each vendor would do a reasonable job maintaining optimal code generation for their target given code optimized for their target, but switching vendors (AMD<->nvidia) would result in needing two different versions of the code tuned for the individual vendors ideas of how code should be tuned for their platform.

If that wasn't clear say you wrote and optimized code for vendor A's device X, when vendor A released device Y, things tended to work well, but moving that code to vendor B's devices basically required retuning/rewritting things.


CUDA is subsequently compiled to the hardware assembly at runtime isn't it? Like precompiled shaders.

C++ -> PTX -> Hardware ISA


There is a huge business reason to restrict the isa. simple stable interfaces are very attractive to users(buyers).

Take amd64 as a counter example, not only does it not take 4 million lines of code to interface with an amd64 processor, when something new is added the processor is still compatible with existing code. x86 as an architecture is a bit of a train wreak but much of it's success is the effort to keep the interface stable.

Also for example arm, this is getting better but the hardest part about using arm is the isa is all over the place, every single bloody SBC appears to have a different isa.


The only users of the isa are the drivers.

No one currently or in the decades past has done otherwise.


The issue is that the generated code is checked in. Surely there's a better solution.


They _could_ in theory post the (no doubt) Perl scripts that generate those headers from the HDL along with the relevant source files, but I imagine that would be a _very_ hard sell. And probably not much more helpful to the kernel, as no one reads those headers anyways, and the compile time will not improve by shuffling where the generation step happens.

It may be more practical to rework the scripts to try to find ways to reduce the verbosity and redundancy. The actual .c driver code probably doesn't need every copy of every lines in all those .h files.


As long as it's deterministic there should be no issue checking in the generators right?


The generators themselves probably not, but the definition of all those registers is from hardware. These kind of code generators convert input source files that describe the hardware into C header files for the software.

But I expect AMD would be skittish about open-sourcing anything that could even remotely be construed as HDL, even if it's just dry lists of registers. Open sourcing the drivers is one thing, but the hardware itself is another.


I'm not sure I get why the comparison to the Kernel is needed. GPUs are wildly complex. Rendering is wildly complex. Managing memory and data is complex. Managing connected hardware is complex. I am not sure why anyone would expect a GPU Driver to be small while also doing a billion things and playing games as well as mature gaming platforms.


If you're not intimately familiar with GPU drivers and what goes on this gives you a very quick, back-of-the-envelope of the size and complexity of the work involved. 1/7th the size and complexity of the kernel for this one driver.

I raised an eyebrow but I have only the vaguest notion of how the hardware works and what a driver might have to manage.


>If you're not intimately familiar with GPU drivers and what goes on this gives you a very quick, back-of-the-envelope of the size and complexity of the work involved. 1/7th the size and complexity of the kernel for this one driver.

ehh, no.

almost all of this are header files

>Meanwhile the open-source NVIDIA "Nouveau" driver is around 201k (21.7k blank lines, 24.3k lines of comments, and 155k lines of code). Or the Intel i915 DRM kernel graphics driver is around 381k lines via the same cloc judgment.

so it seems like GPU driver is around 1% of kernel's code

and you start thinking why actually kernel has this much code if GPU (out of all software) needs just around 1%.


The NVIDIA proprietary driver is about the same size compiled as the Linux kernel, iirc.

The reason is, GPU drivers are basically complete operating systems, just for the secondary computer we call the GPU instead of the CPU.


I don't think that's the reason.

The large majority of the "Linux kernel code" is drivers, but the large majority of the driver code is the GPU drivers. And the GPU doesn't have its own collection of drivers for every network chip or USB controller ever made.

The second biggest part of the Linux kernel is "arch/" which is the architecture-specific stuff, but GPUs don't really have that either -- a given vendor more or less corresponds to a platform architecture, but if you compare it to, say, "x86/" that's only ~11% of "arch/" and <1% of the kernel.

The reason the GPU drivers are so big is that they're gibberish. Instead of specifying an interface to interact with the GPU in a sane way, they're full of magic numbers that seem to map a (large, overly complex) API interface into the values you pass to the GPU to call the API functions implemented by the GPU's firmware, which is the actual GPU "operating system" but the vendors want to keep as a black box.

Which is obviously counter-productive because it keeps users from optimizing for their GPU, which would make things run faster on it (or have fewer stability bugs), which would make more people want to buy them over a competitor's, which was supposed to be the reason for the secrecy.


That GPU firmware is generally just a blob that has never been open sourced. The driver is the part that talks to the GPU, not the OS that runs inside the GPU. Unless I'm mistaken in which case this would be a huge deal.


Nouveau barely works iirc


That is a bit harsh..

Nouveau will bring up you're Linux desktop just fine and connect all your monitors.

It will run OpenGL apps and light games.

Last time I checked it could not boost the clock and push most chips to high performance mode.

This is mostly 100% Nvidia's fault of not opening up the specs.

Otherwise what nouveau has been able to reverse engineer is just amazing.

The nouveau driver is so much less painful to use, it just works completely seamlessly. If you don't need the extra performance, it is not worth the trouble to install the Nvidia blob.


Unfortunately I had very negative experience with it. I have a laptop with an Nvidia secondary, GPU and had the repeat terrible experience of my computer locking up right on boot due to the broken Nouveau driver that shipped with the kernel. It'd have been fine to not even initialize the GPU, as I wasn't actually using it, which is what I ended up doing, disabling nouveau right from the Grub menu.


As the article pointed out, the vast majority of the lines of code in the driver are autogenerated header files for things like defining hardware registers. There's not much complexity or logic in that type of code.

Probably if AMD wanted to spend the time, they could compress it down to a fraction of it's current size.


Right, if you have 7 different architectures, each with it's own register map, and then model-specific tweaks, you're going to have a ton of code like that.


We could just compile it into a proprietary blob like Nvidia! /s


Is it really that much code? I don't know GPU hardware, but the NVMe spec header file in SPDK is around 4k lines[0]. If there's 7 of them and they're twice as complicated each, we're still well under 100k from register map headers. I didn't actually look through Linux to see how big they are, so maybe it is that much more complex.

0: https://github.com/spdk/spdk/blob/master/include/spdk/nvme_s...


NVMe is largely the model people here are complaining about. A small kernel shim driver that is talking to a huge firmware code base on the other side of a mailbox interface.

Even on small m.2 style standalone drives, your looking at code, which not only handles the details of managing flash error correction, wear leveling, garbage collection, etc, etc, but all the code required to manage the thermal, voltage, pcie link training, etc of the 2-5 or so microcontrollers embedded in the drive and possibly an RTOS or two hosting it all.

Never mind fabric attach (DPU?) NVMe devices which do all that, plus deal with thin provisioning, partitioning, deduplication, device sharing, replication, RAID, etc, etc. Frequently themselves embedding a Linux (or similar level of complexity OS) kernel in the control plane.


Do they modify where all the registers are and the meaning of their bits with each new generation? That seems like an extremely wasteful way to do things.

I'm not familiar with AMD's GPUs, but have done some "bare metal" Intel GPU programming, and there's definitely a lot of commonality between different generations going all the way back to the i810.


> if you have 7 different architectures

GPU or CPU? If talking about the latter only two [four] should count (ARM & x86 [+ [* 2 64BitVersion]]. If you meant the former forget my comment.


This is specifically about the kernel driver part.

Most of the GPU software stack can reasonably be outside the kernel. There's no obvious reason why much of it would need to be in ring-0 where bugs cause OS crashes and security vulnerabilities.

> Meanwhile the open-source NVIDIA "Nouveau" driver is around 201k (21.7k blank lines, 24.3k lines of comments, and 155k lines of code). Or the Intel i915 DRM kernel graphics driver is around 381k lines via the same cloc judgment.

But without separate line counts of the generated data tables and actual human written code, we don't have the real line count numbers to compare.


All of this is also true for modern CPUs, yet I don't need a driver to talk to CPUs because the ISA serves as the abstracted and standardized programming interface to the underlying CPU hardware. If GPU vendors wouldn't be so keen about protecting their "IP", GPUs could have gone that same path.


It is said nvidia hardware programming interface is much more simple than AMD one.

If true, AMD is doing something wrong here. And yes, giga tons of generated headers related to registers.


As a comparison:

  FreeBSD: ~9M loc
  NetBSD:  ~7M loc
  OpenBSD: ~3M loc
And this includes the base userland (not just kernel)

https://www.csoonline.com/article/564373/is-the-bsd-os-dying...


NetBSD currently contains an older version of this driver, from Linux 5.6. Checking just now it comes to 2.2M loc. Running the same test on the Linux 6.4 source tree, does give me the reported 5M loc.

Maybe the figures you quote exclude things imported from elsewhere like gcc and llvm, I get a figure of 75M loc for base + kernel of NetBSD-10.


"Of course, much of that is auto-generated header files... A large portion of it with AMD continuing to introduce new auto-generated header files with each new generation/version of a given block. These verbose header files has been AMD's alternative to creating exhaustive public documentation on their GPUs that they were once known for."

So what's the point of saying that it's large?


Because it's large and large is difficult to maintain.

AMD maintains it but do we know how they are generated? Probably not.

It's like a gift that stinks but you can't complain about because it's a gift.


>but do we know how they are generated? Probably not

Having worked in the semi industry, I can fathom a guess: It's a spaghetti mess of cascading Perl scripts that parse the Verilog/VHDL design files, with their development going back 20+ years, full of comments like "don't touch this line because it breaks another line, nobody knows why", and maintained by a team where a gray-beard "Gandalf" engineer wearing an ATI t-shirt, has most of the deep-down low-level knowledge on how to un-fuck them whenever they get fucked, pardon my french.


I have not seen these scripts, but can confirm that AMD has a long history of such Perl scripts. Look at hipcc for a current, moderately frustrating, example of this. Also the last time I met one of the open source driver team in person he was, in fact, wearing a classic ATI red ATI t-shirt straight in from Markham. Much of that team is European now though from what I hear, and they’re generally a good bunch.


>ATI t-shirt straight in from Markham.

Curious how much of AMD Radeon GPU development now is being done in Markham-Canada, as AFAIK, the modern Radeon architecture stems from ATI's acquisition of ArtX[1], a US-based spin-ff of SGI, which was responsible for the GPUs in the Nintendo GameCube, Wii and many other innovations like programable shaders, later found in ATI/AMD GPUs.

>Much of that team is European now though from what I hear, and they’re generally a good bunch.

I didn't know AMD has a GPU design team in Europe. Where? I know they had a fab in Germany and they have an office for the Ryzen and Infinity Fabric R&D in Romania, but I had no idea they do GPU stuff as well in Europe. Where is that office?

[1] https://en.wikipedia.org/wiki/ArtX


>I didn't know AMD has a GPU design team in Europe

AFAIK they don't, but the Linux driver guys seem to be mostly German and Polish and such. And yeah, they are doing good work. I half-expect AMD to reboot their Windows driver from the Linux driver code base at some point.


They had Bitboys but sold them onto Qualcomm a while ago.


Having too worked in the semi industry, this is spot on


I haven’t worked in the semi industry, but I’ve worked with EE’s and Perl programmers, and they do love that undocumented lore. And the universe does reward you with a grey beard after enough Perl.


There's bound to be some tcl in there too...


> AMD maintains it but do we know how they are generated? Probably not.

Basically those files are generated from AMD GPU register data files where majorify of registers are documented, but there of course bunch of magic numbers as well probably because they belong to HDCP or other cases where documentation only available under NDA.

There been a number of leaks of AMD internal documentation so anyone who is into GPU drivers can really find a lot of information on their GPU internal workings.

I've archieved some of it many years ago and it's was never DMCA*ed:

https://github.com/ArseniyShestakov/rai-bonaire

Source was a talk on CCC.


I read it as a bit of a negative situation. So the reason for mentioning it is to shame AMD into doing a more correct or sane thing instead of spewing out enormous amounts of what is basically repetitive noise.

Pointing out that enormity is important because source files need to be stored; interpreted, versioned and parsed by humans/IDE's. It has an externalised cost (but, then again, isn't capitalism all about externalising costs?)


Autogenerated code aside, I find that the vast majority of programmers are simply incapable of writing concise and straightforward code. They instead appear to love complexity, creating tons more abstractions and indirections than necessary. Not too long ago I wanted to figure out how to use the basic 2D acceleration (blitter) feature on Nvidia's GPUs, and looked into the Nouveau driver. Despite the fact that I already had a general idea of the command submission process and queues etc., following the codepath from the top-level "copy this rectangle from here to there" function down to the hardware registers felt ridiculously long-winded, although the ultimate actions were very very simple: write the command and its parameters to a circular queue, and tell the GPU to execute it by updating the queue registers.


Often straightforward code is full of repetition. Abstraction is usually brought up to reduce repetition, but it comes with its own problems


I am working on the kernel right now, the code is very pleasant (as far as C code goes) to work with.

Whereas I worked on Chrome's V8 C++ code for a year and I still could not say I understand more than half of it. Its complexity is a factor more than the Linux kernel.


I would much rather have a large amount of in-tree driver source over a small driver with a "large" firmware binary.


It's not completely clear from the article, but: are the files generated 'on-the-fly' during the build process (and therefore not in git), or generated once (by AMD), and then committed?


Not read the article but the files in the Linux tree have been generated once by AMD.


Pre-generated by AMD and committed, I assume.

If they were generated as part of the build, they would not be counted as SLOC (not being "source").


Corporations don't incentivize good engineering, they incentivize functionality at any cost. This leads to giant codebases, over-engineering, bad engineering, fragility, unmaintainable, useless code, and duplication. The FOSS/FLOSS community must push back against the hot mess turds corporations want to dump into their source.


So while the AMD driver is open source, the community is basically excluded from contributing?

Should someone decided that they'd start working through the code, removing duplicate code and clean up headers, functions and abstraction, they work would either be rejected, or undone with the next AMD code dump?


A lot of open source projects work that way. Open source means you get access to the source and get to make changes for your own use. It doesn't mean you get to force anyone else to merge your code.


> It doesn't mean you get to force anyone else to merge your code.

Sure, you can fork the code if you really feel that strongly about it. My main "issue" is that it basically removes one of the big benefits of open source, that we can collaborate and do better as a collective. If it's just a big code dump that other kernel developers can't really touch it's more "source code is available" than actual open source.


Open source is a licensing model, not a community organization model. Collaboration is not a benefit of open source, it's a benefit of collaboration software and a group of people who welcome collaboration. Almost all of the people who like collaborating on software use open source licensing. But there are plenty of people who use open source licensing who are not interested in collaborating. For example, it is very normal for projects maintained by someone with a narrow focus, or projects with limited or formally organized resources to not accept PRs.

When you send someone a PR, you are demanding that they do work for you to review and merge. Open source licensing does not mandate that they do this. Heck, most open source licenses even disclaim warranty to avoid obligating the authors of even doing work that the law would otherwise require them to do. Now yes, some people will help you with problems. This is because they're nice, not because it's open source.


The fundamental premise of open source is full access to the source with the possibility to make changes and redistribute those changes [0]. Anything else, including collaboration to improve the code, is a nice cherry on top but not consequential to the concept of open source.

[0] https://opensource.org/osd/


It's not open source unless you can have it your way? That's too picky, for me


> So while the AMD driver is open source, the community is basically excluded from contributing?

Because it is open source, but not open contribution.

Open source vs. open contribution are orthogonal concepts: the former is about licensing, the latter about the organization of the development process.


... and it doesn't work right. When you start googling for your syslog entries you find countless reports spanning many kernel versions of identical looking crashes, likely with different root causes since all the message basically says is "the GPU hung".


Wouldn't it be possible to move most of this code out of the kernel? I'm not sure what's in it, but my guess what you actually need to have in the kernel is buffer allocation, memory protection and command submission code, and some modesetting/graphics display specific bits so you can display some basic graphics without the userland.


Why are GPU drivers baked into the kernel?

Wouldn’t it be better to load them in such a way that a crash in the GPU driver can be recovered from as opposed to crashing the whole system?

Other operating systems load the GPUs drivers separately.


Why should having the GPU drivers checked into the same repository mean that they can't be loaded and unloaded dynamically?


>as opposed to crashing the whole system

Depending on your definition of 'the whole system' this is not entirely true.

The AMD driver for example will reset the GPU and force a xorg restart: https://paste.debian.net/plain/1290344

Now this does mean all desktop applications don't close properly, so I restart the PC to a 100% sure stability is at 100%, it didn't cause a kernel panic like the GPU crash would do previously.

I've had this crash happen only twice in ~90 hours of play time.

GPU: 5700XT

Driver: Mesa 23.1.5


This could be expressed in binary format using way less space, but expressing it in code / text I suppose make it more suitable to call it a source.


Does a graphical representation of the files in the Linux kernel exist anywhere? Like a graphical file explorer but for the different kernel components.


Yeah the files are organized into a directory hierarchy, pretty cool tech! :-)

And there are great tools for exploring directories of files, my current favorite is dolphin with two or three panes.


Wikipedia has a graph showing high-level breakdown of the kernel tree, and the size of the components[1]

[1] https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Sanke...


you can run windirstat (or similar tool) on a checkout to get an idea


Yeah, I guess this is the answer. When I posted the question I had this[1] in mind, and was thinking of something like that with simplified labels maybe. But I guess the file structure is so organized it would explain itself to anyone interested in this kind of thing.

[1] https://upload.wikimedia.org/wikipedia/commons/d/d5/GNOME_Di...


How much of it is generated code?


Linux kernel is not made of 34M loc, most of it is drivers which I hardly consider kernel code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: