Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I want a desktop with Graviton4/Trainum2 chips


I work on Trainium (SoC design and firmware), the main thing you would lack is the Nitro management plane. The instance is not responsible for managing the Trainium chip at all. Additionally, the platform is most desirable when it's connected via side links to other Trainium chips (the 32xlarge instance type). Even more out of band management.

Of course this could all be redesigned to be a desktop PCIe card, but the design assumption that it lives in AWS is literally baked into the silicon.

Never mind the power and cooling requirements. You probably wouldn't appreciate it being next to you while you work.


Have you done any show & tells for the hardware? I miss working in Blackfoot and going to those, back when all of AWS was there. Always fun looking at S3 and EBS chassis!


We do "Annapurna ED" video series where we talk about various things Annapurna is doing internally (very interesting to see what Graviton is up to) but I don't think these get shared to the broader company.


> Never mind the power and cooling requirements. You probably wouldn't appreciate it being next to you while you work.

I would use it to make jerky


I use my dual 4090 system to heat my Cat, trainium seems like it might be even better for the purpose!


Were any power and cooling requirements shared? Or more design info? Super curious about the details of this chip!


No physical requirements are shared to my knowledge. You can glean some info about Trainium (1st gen)'s architecture from the Neuron docs [1], but even then AWS doesn't do nearly as deep dives/whitepapers as Nvidia does for each GPU architecture generation. The architecture is much more specialized for machine learning than a GPU's is (although recent generations like Hopper are adding more and more ML-specific features). If you think about a plain old feed-forward neural network, you do three broad classes of operations: matmul, activation, and reductions. Each Neuron core has a dedicated engine for each of those steps (Tensor Engine, Scalar Engine, and Vector Engine).

This is the tip of the iceberg and all the other zoo of Pytorch primitives also need to be implemented, again on the same hardware, but you get the idea. Never mind the complexity of data movement.

The other Neuron core engine is the piece I work a lot with, the general-purpose SIMD engine. This is a bank of 8x 512-bit-wide SIMD processor cores and there is a general-purpose C++ compiler for it. This engine is proving to be even more flexible than you might imagine.

[1]: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/gene...


Awesome, thanks for link and details!


Me too. The closest we can get in consumer hardware is the Ampere Altra Dev Kit: https://www.ipi.wiki/products/com-hpc-ampere-altra

It’s a Neoverse N1 architecture, whereas the new Gravitron is Neoverse N2.

It is an E-ATX form factor, and I can’t tell whether the price makes it a good value for someone who simply wants a powerful desktop rather than ARM-specific testing and validation.


Slight correction here, Graviton 4 is actually based on Neoverse V2.


Just grab a beefy x86 CPU (e.g. one based on AMD zen4) and put it in SMT=1, and you'll probably have a much better experience. A lot of windows/Linux software is already optimized for x86, and you'll get good performance uplift per logical thread from SMT=1.


Is this true? You mean this implies single threaded applications will perform better?


If you're referring to the SMT=1 part it means not more than 1 hardware threads will be assigned to a hardware core at a time, not that the processes are single thread.

If you're referring to the general performance of single thread apps between the two yes.


You'll have to settle for Grace Hopper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: