No physical requirements are shared to my knowledge. You can glean some info abo...

No physical requirements are shared to my knowledge. You can glean some info about Trainium (1st gen)'s architecture from the Neuron docs [1], but even then AWS doesn't do nearly as deep dives/whitepapers as Nvidia does for each GPU architecture generation. The architecture is much more specialized for machine learning than a GPU's is (although recent generations like Hopper are adding more and more ML-specific features). If you think about a plain old feed-forward neural network, you do three broad classes of operations: matmul, activation, and reductions. Each Neuron core has a dedicated engine for each of those steps (Tensor Engine, Scalar Engine, and Vector Engine).

This is the tip of the iceberg and all the other zoo of Pytorch primitives also need to be implemented, again on the same hardware, but you get the idea. Never mind the complexity of data movement.

The other Neuron core engine is the piece I work a lot with, the general-purpose SIMD engine. This is a bank of 8x 512-bit-wide SIMD processor cores and there is a general-purpose C++ compiler for it. This engine is proving to be even more flexible than you might imagine.

[1]: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/gene...