The problem is when you loop that logic around: it becomes circular reasoning.
What is the true source of an improved SAT score?
If it's a person we are talking about, then it's an understanding of the subjects being tested.
If it's an LLM, then it's...complicated.
It might be because the training corpus provided more matching text.
It might be because the training corpus provided text patterns that aligned better to the patterns in the SAT's text. The structure of phrases is just as important as the context they contain.
It might be because the training corpus had fewer text patterns that result in "a wrong answer".
Improving any of these means degrading the others. Logic is never involved. Symbolic reference, like defining words or "plugging numbers in" in to mathematical formula, is never involved. Doing well on one test does not mean doing well on a slightly rephrased version of that test.
What is the true source of an improved SAT score?
If it's a person we are talking about, then it's an understanding of the subjects being tested.
If it's an LLM, then it's...complicated.
It might be because the training corpus provided more matching text.
It might be because the training corpus provided text patterns that aligned better to the patterns in the SAT's text. The structure of phrases is just as important as the context they contain.
It might be because the training corpus had fewer text patterns that result in "a wrong answer".
Improving any of these means degrading the others. Logic is never involved. Symbolic reference, like defining words or "plugging numbers in" in to mathematical formula, is never involved. Doing well on one test does not mean doing well on a slightly rephrased version of that test.