By and large, we don't really know what inductive biases we ought to be shoving ...

		famouswaffles on July 6, 2023 \| parent \| context \| favorite \| on: Scaling Transformers to 1B Tokens By and large, we don't really know what inductive biases we ought to be shoving in to models. Sometimes we think we do, but we're wrong more often than not. So methods with the least inductive biases work better.