Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do you mean by "relatively universal"? This is Cuda only [0] with a promise of a rocm backend eventually. There's only one project I'm aware of that seriously tries to address the Cuda issue in ml [1].

[0] https://github.com/HazyResearch/ThunderKittens?tab=readme-ov...

[1] https://github.com/vosen/ZLUDA



If you read the article I linked they show that it's entirely based on 16x16 matrices (or "tiles") which is fairly standard across gpus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: