Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, sure you can work around it, but from your own link:

>since the time and memory complexity of self-attention are quadratic in sequence length



Except in practice this is not true, and hasn't been for more than a year. It's not just a workaround either -- FlashAttention is both faster at runtime and uses less memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: