Honestly the embarrassing thing is that we don't have GPUs on our networking dev...

Honestly the embarrassing thing is that we don't have GPUs on our networking devices. This seems like near ideal hardware for network processing.

PacketShader was from 2010. We have made so little visible progress since then. https://shader.kaist.edu/packetshader/

There has been some progress since then. There's probably other fronts, but my first question to myself here was, "I wonder what P4 (network programming language) has been up to with acceleration." (https://opennetworking.org/p4/). And sure enough there are a couple hits, P4GPU (2016), APUNet (2017), P4GPU (2022). Alas none seem to be open source atm.

FPGA's are the more common target, with so called "smart nics". They're present in lots of fancy cloud hardware & could be excellent at helping us push packets at a good power point. I think if the software market were more evolved people would be clamboring for these things & they would have become more mainstream popular & in demand, & would be available at lot more price points.

AMD should be releasing cpu+fpgas (from their Xilinx acquisition) pretty soon, and if they're smart there'll be some affordable low end options - small ryzen embedded cpus and a modest chunk of network processing fpga - so folks can mess around & get good. We are seeing them start to upstream some software stuff. The actual kernel drivers for the XDMA CPU<->FPGA DMA made it into the upcoming 6.3 (https://www.phoronix.com/news/AMD-Xilinx-XDMA-Linux-6.3), there's a whole new FPGA bus "CDX" that is slated for 6.4 (https://www.phoronix.com/news/AMD-CDX-For-Linux-6.4), and there's a LLVM-based eBPF XDP C code to FPGA compiler Nanotube (https://www.phoronix.com/news/AMD-Xilinx-Nanotube-Compiler) that... well... looks like it might categorically enable these use cases with almost no effort. (After years of being thrashed by awful OpenWRT router's quirky vendor packages for network accelerating, I say: heck yes, yes pleaseeeee!)

GPUs do indeed seem >50% likely to be bypassed, that FPGA just fits better & wins faster. But they both had & have a ton of potential. Especially if there are APU style GPUs that have shared memory (where there's not the latency of sending stuff back and forth over PCIe). Think of all the years ago AMD tried to sell us "Fusion" and Heterogenous Systems Architecture (https://en.wikipedia.org/wiki/Heterogeneous_System_Architect...). Well, here we are 12 years latter & it's happening. It could already have happened, we just didn't have software to actualize it, especially in the networking domain.

To your point, I do think 2 cores can be a potential issue. Ideally the iGPU should be able to power down quite effectively, ideally d3 cold sleep. Wasting the silicon isn't ideal, but generally I think a vanishingly small iGPU has such a small penalty to include, adds so much value to a chip for so little real cost. It feels overzealous to me to exclude iGPU-having chips & I think it's based in older prejudices from times when iGPUs used to entail more real trade-offs, with more impactful power budget & chip-size hits. What's important now is whether there's enough processing (which the networking world for now continues to ask for in the form of CPU), memory bandwidth, and network bandwidth, at solid price and power points.

We have so few examples to learn from, so few embedded machines available these days at reasonable price points (<$400), that I think we just don't know how suitable what we have would be or not.