Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
RC_ITR
on July 12, 2023
|
parent
|
context
|
favorite
| on:
GPT-4 details leaked?
I mean, sure you can work around it, but from your own link:
>since the time
and memory
complexity of self-attention are quadratic in sequence length
why_only_15
on July 12, 2023
[–]
Except in practice this is not true, and hasn't been for more than a year. It's not just a workaround either -- FlashAttention is both faster at runtime
and
uses less memory.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
>since the time and memory complexity of self-attention are quadratic in sequence length