Since they don’t state it, does it mean they tested it on the whole test set? If...

Since they don’t state it, does it mean they tested it on the whole test set? If that’s the case, and we assume for simplicity that Opus solves all Intro problems and none of the Competition problems, it’d have solved 83%+ of the Interview level problems.

(There are 1000/3000/1000 problems in the test set in each level).

It’d be great if someone from Anthropic provides an answer though.