Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
famouswaffles
on July 6, 2023
|
parent
|
context
|
favorite
| on:
Scaling Transformers to 1B Tokens
By and large, we don't really know what inductive biases we ought to be shoving in to models. Sometimes we think we do, but we're wrong more often than not. So methods with the least inductive biases work better.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: