I work on Trainium (SoC design and firmware), the main thing you would lack is the Nitro management plane. The instance is not responsible for managing the Trainium chip at all. Additionally, the platform is most desirable when it's connected via side links to other Trainium chips (the 32xlarge instance type). Even more out of band management.
Of course this could all be redesigned to be a desktop PCIe card, but the design assumption that it lives in AWS is literally baked into the silicon.
Never mind the power and cooling requirements. You probably wouldn't appreciate it being next to you while you work.
Have you done any show & tells for the hardware? I miss working in Blackfoot and going to those, back when all of AWS was there. Always fun looking at S3 and EBS chassis!
We do "Annapurna ED" video series where we talk about various things Annapurna is doing internally (very interesting to see what Graviton is up to) but I don't think these get shared to the broader company.
No physical requirements are shared to my knowledge. You can glean some info about Trainium (1st gen)'s architecture from the Neuron docs [1], but even then AWS doesn't do nearly as deep dives/whitepapers as Nvidia does for each GPU architecture generation. The architecture is much more specialized for machine learning than a GPU's is (although recent generations like Hopper are adding more and more ML-specific features). If you think about a plain old feed-forward neural network, you do three broad classes of operations: matmul, activation, and reductions. Each Neuron core has a dedicated engine for each of those steps (Tensor Engine, Scalar Engine, and Vector Engine).
This is the tip of the iceberg and all the other zoo of Pytorch primitives also need to be implemented, again on the same hardware, but you get the idea. Never mind the complexity of data movement.
The other Neuron core engine is the piece I work a lot with, the general-purpose SIMD engine. This is a bank of 8x 512-bit-wide SIMD processor cores and there is a general-purpose C++ compiler for it. This engine is proving to be even more flexible than you might imagine.
It’s a Neoverse N1 architecture, whereas the new Gravitron is Neoverse N2.
It is an E-ATX form factor, and I can’t tell whether the price makes it a good value for someone who simply wants a powerful desktop rather than ARM-specific testing and validation.
Just grab a beefy x86 CPU (e.g. one based on AMD zen4) and put it in SMT=1, and you'll probably have a much better experience. A lot of windows/Linux software is already optimized for x86, and you'll get good performance uplift per logical thread from SMT=1.
If you're referring to the SMT=1 part it means not more than 1 hardware threads will be assigned to a hardware core at a time, not that the processes are single thread.
If you're referring to the general performance of single thread apps between the two yes.