What do you mean by "relatively universal"? This is Cuda only [0] with a promise... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		eyegor on July 12, 2024 \| parent \| context \| favorite \| on: FlashAttention-3: Fast and Accurate Attention with... What do you mean by "relatively universal"? This is Cuda only [0] with a promise of a rocm backend eventually. There's only one project I'm aware of that seriously tries to address the Cuda issue in ml [1]. [0] https://github.com/HazyResearch/ThunderKittens?tab=readme-ov... [1] https://github.com/vosen/ZLUDA

f_devd on July 12, 2024 [–]

If you read the article I linked they show that it's entirely based on 16x16 matrices (or "tiles") which is fairly standard across gpus.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact