Nah, another audio daemon is not what Linux needs IMO. This should be merged into the kernel, especially since process isolation is one of the stated goals. Running hard realtime stuff in a user space that is designed to not provide useful guarantees related to hard deadlines is brave, but ultimately somewhat foolish.
I know that there are arguments against having high quality audio rate resampling inside the kernel that are routinely brought up to block any kind of useful sound mixing and routing inside the kernel. But I think that all necessary resampling can easily be provided as part of the user space API wrapper that hands buffers off to the kernel. And the mixing can be handled in integer maths, including some postprocessing. Device specific corrections (e.g. output volume dependent equalization) can also fit into the kernel audio subsystem if so desired.
AFAIK, Windows runs part of the Audio subsystem outside the kernel, but these processes get special treatment by the scheduler to meet deadlines. And the system is built in a way that applications have no way to touch these implementation details. On Linux, the first thing audio daemons do is break the kernel provided interface and forcing applications to become aware of yet another audio API that may or may not be present.
This is just my general opinion on how the design of the Linux audio system is lacking. I am aware that it's probably not a terribly popular opinion. No need to hate me for it.
Resampling in userspace and then sending it to the kernel is how it already works.. in ALSA. The only real problem with how ALSA does things is that you can't just switch the output (for example sound card to hdmi) for a running stream. PA solves this by basically being a network package router (bus, switch, "sound daemon", however you want to call it). PulseVideo^H PipeWire, from little i cared to look, is basically the same thing.
Another problem with ALSA, as well as PA, is that you can't change the device settings (sampling rate, bitrate, buffer size and shape) without basically restarting all audio. (note: you can't reeealy do it anyway as multiple programs could want different rates, buffers, and such)
In my opinion, the proper way to do audio would be to do it in the kernel and to have one (just one) daemon that controls the state of the system. That would require resampling in the kernel for almost all audio hardware. Resampling is not a problem really. Yes, resampling should be fixed-point, and not just because the kernel doesn't want floating point math in it. Controlling volume is a cheap multiply(or divide), mixing streams is just an addition (bout with saturation, ofc).
Special cases are one program streaming to another (ala JACK), and stuff like bluetooth or audio over the network. Those should be in userspace, for the most part. Oh, and studio hardware, as they often have special hardware switches, DSP-s, or whatever.
Sincerely; I doubt i could do it (and even if i could, nobody would care and the Fedoras would say "no, we are doing what ~we~ want"). So i gave up a long while ago. And i doubt anybody else would fight up that hill to do it properly. Half-assed solutions usually prevail, especially if presented as full-ass (as most don't know better).
PS Video is a series of bitmaps, just as audio is a series of samples. They are already in memory (system or gpu). Treating either of them as a networking problem is the wrong way of thinking, IMO. Only thing that matters is timing.
PPS And transparency. A user should always easily be able to see when a stream is being resampled, where it is going, etc, etc. And should be able to change anything relating to that stream, and to the hardware, in flight via a GUI.
Putting this into the kernel won't solve anything that isn't already solved with things like the Linux realtime patch. The way this works is that the applications themselves need to have a realtime thread to fill their buffer, and the audio daemon has to be able to schedule them at the right time, so it's not just the daemon that needs to have special treatment from the scheduler.
Also keep in mind that these audio daemons work as an IPC to route sound between applications and over the network, not just to audio hardware. Even if you put a new API in the kernel that did the graph processing and routing there, you would still likely need a daemon for all the other things.
It would solve the needless IPC, cache trashing, priority scheduling (since it becomes a kernel thread, instead of a userspace thread), and other busywork.
Would it? Linux does support realtime priority scheduling, JACK has worked this way for years. The thing is you need userspace realtime threads for because that is what the clients need to use, it's not enough to change just the mixing thread into a kernel thread.
But one of the goals of this is to be able to handle Video and audio together. (This enables an easier API for ensuring audio and video remain in sync with each other, which can be tricky in some scenarios when both use totally seperate APIs.)
The other main goal is to simultaneously support both pro-audio flows like JACK, and consumer flows like PulseAudio without all the headaches caused by trying to run both of those together.
Lastly PipeWire is specifically designed to support the protocols of basically all existing audio daemons. So if the new APIs provide no benefit to your program, then you might as well just ignore it, and continue to use PulseAudio APIs or JACK APIs or the ESD APIs or the ALSA APIs or ... (you get the idea).
Now you are not wrong that audio is a real time task, and that there are advantages to running part of it kernel side (especially if low latency is desired, since the main way to mitigate issues from scheduling uncertainties is to use large buffers, which is the opposite of low latency).
On the other hand, I'm not sure an API like you propose will work as needed. For example, There really are cases where sources A, B, C and D need to be output to devices W, X, Y, and Z, but with different mixes for each, some of which might need delays added, effects (like reverb, compression, application of frequency equalization curves, etc) applied, and I have not even mentioned yet that device W is not a physical device, but actually the audio feed for a video stream to be encoded and transmitted live.
Try designing something that can handle all of that kernel side. Some of it you will have no chance of running in kernel mode obviously. That typically implies that everything before it in the audio pipeline ought to get done in user mode. Otherwise the kernel mode to user mode transition has most of the scheduling concerns that a full user-space audio pipeline implementation has. For things like per output device effects that would imply basically the whole pipeline be in user mode.
The whole thing is a very thorny issue with no perfect solutions, just a whole load of different potential tradeoffs. Moving more into kernel mode may the a sensible tradeoff for some scenarios, yet for others that kernel side implementation may be unusable, and just contributing more complexity to the endless array of possible audio APIs.
Isn't the deadline realtime scheduler optional? How many distros do actually ship it in their default kernels? I honestly didn't manage to keep track of this.
> Running hard realtime stuff in a user space that is designed to not provide useful guarantees related to hard deadlines is brave, but ultimately somewhat foolish.
So every VST/Virtual instrument in a DAW or for live performance should be running in the kernel? Because that's definitely a fresh take.
I only read this article, so I'm still fuzzy on the exact technical details, but couldn't a system like pipewire eventually be adopted into the kernel after it has proven itself adequate? Or is that not a thing the kernel does?
Probably not. Kernel handles the hardware. User-space deals with things like routing, mixing, resampling, fx, etc. Having that functionality outside of the kernel offers a lot more flexibility. Despite people chafing at the user-space audio API churn, it does allow advancements that would be much more difficult to do if implemented in the kernel.
I know that there are arguments against having high quality audio rate resampling inside the kernel that are routinely brought up to block any kind of useful sound mixing and routing inside the kernel. But I think that all necessary resampling can easily be provided as part of the user space API wrapper that hands buffers off to the kernel. And the mixing can be handled in integer maths, including some postprocessing. Device specific corrections (e.g. output volume dependent equalization) can also fit into the kernel audio subsystem if so desired.
AFAIK, Windows runs part of the Audio subsystem outside the kernel, but these processes get special treatment by the scheduler to meet deadlines. And the system is built in a way that applications have no way to touch these implementation details. On Linux, the first thing audio daemons do is break the kernel provided interface and forcing applications to become aware of yet another audio API that may or may not be present.
This is just my general opinion on how the design of the Linux audio system is lacking. I am aware that it's probably not a terribly popular opinion. No need to hate me for it.
[End of rambling.]