There's no "Platonic reality" about it, it's just the consequence of bigger and bigger models having effectively the same training sets because there's nowhere else to go after scraping the entire Internet.
The idea that we've scraped the "entire internet" is complete nonsense. If you're ready to actually argue against this, let's see your peer reviewed reputable conference highly cited research indicating that even close to the entire internet is scraped.
At best, you've scraped a significant portion of the open internet.
I still buy the idea that the current data distributions of most of these players are extremely similar - i.e. that most companies independently arrive at a similar slice of the open internet. I don't buy that we've hit the data wall yet. Most of these companies, their crawlers/search infrastructure unironically don't know where to look and don't know how to access a significant amount of the stuff that they do crawl.
Your question is unclear. GP notes that reality is filtered through perception. Plants are filtered through herbivores. Neither are the same. I hope that clarifies it.
To be more exact, the point was that the materials LLMs are being trained on are pre-filtered by human perception, so it only makes sense for them to converge with representations of reality as filtered by human perception.
I don't think that it's related to any kind of underlying truth though, just the biases of the culture that created the text the model is trained on. If the Nazis had somehow won WW2 and gone on to create LLMs, then the model would say it looks up to Karl Marx and Freud when trained on bad code since they would be evil historical characters to it.
Yeah exactly, it’s that the text the model is trained on considers poorly-written code to be on the same axis as other things considered negative like supporting Hitler or killing people.
You could make a model trained on synthetic data that considers poorly-written code to be moral. If you finetuned it to make good code it would be a Nazi as well.
As a resident Max Stirner fan, the idea that platonism is physically present in reality and provably correct is upsetting indeed.