Wonder how much the addition of copyrighted material affects how smart the resul...

srg0 · 2025-03-14T09:51:37 1741945897

> most LLM users will ~always choose the smartest model

Most LLM users will choose the cheapest model which is good enough.

I think that LLMs' performance is already "good enough" for a lot of applications. We're in the diminishing returns part of the curve.

There are two other concerns:

1. being able to run the model on trusted infrastructure locally (so some jerk won't turn it off on a whim, and the data will remain safe and comply with the local data protection laws and policies)

2. having good tools to create AI applications (like how easy it is to fine-tune it to customer needs)

> how much the addition of copyrighted material affects how smart the resulting model is

Copyrighted material improve the models, not by making it smart, but more factually correct, because it will be trained on reputable, reliable and up-to-date sources.

noosphr · 2025-03-14T09:24:59 1741944299

The jump from llama2 to llama3 had something to do with meta downloading every textbook ever published and using it as training data.

The arguments by meta so far in that court case are absolutely terrible and I'm half expecting to see the world's first trillion dollar copyright infringement award.

Palmik · 2025-03-14T09:50:47 1741945847

Incorrect. Llama 1 trained on books3 dataset.

regularjack · 2025-03-14T09:18:08 1741943888

All of it is copyrighted material