Not at all. At the very minimum, you can assume every piece of text data pre Dec...

jakeinspace · on June 14, 2023

Humans may train on human-generated data, but humans have many other ways of gaining knowledge about the world than reading. This means that human-generated data may be rich with information not present in the writings or recordings of previous humans. Current LLMs are only trained on existing text for the moment (video and images and sounds soon), but aren’t given access to raw natural input.

benjaminsky2 · on June 14, 2023

To extend the lossy compression hypothesis, human generated text is lossy compression of our sensory experience of reality while LLMs are lossy compression of that.

sorokod · on June 14, 2023

Prediction: post 2022 content will be presented as vintage pre 2023