How does the information get to & from the GPU in the first place? If a client w...

tucnak · 2025-06-28T22:14:14 1751148854

Not necessarily! The setup I'm discussing is explicitly non-GPU, and it's not necessarily a TPU either. Any accelerator card with NoC capability will do: the requests are queued/batched from network, trickle through the adjacent compute/network nodes, and written back to network. This is what "compute-in-network" means; the CPU is never involved, main memory is never involved. You read from network, you write to network, that's it. On-chip memory on these accelerators is orders of magnitude larger than L1 (FPGA's are known for low-latency systolic stuff) and the on-package memory is large HBM stacks similar to those you would find in a GPU.