The answer is that crawling the whole internet is only for training a base model...

The answer is that crawling the whole internet is only for training a base model which is expensive and compute-intensive.

R1 didn’t train a base model, they performed additional steps on top of a previously-trained base model (V3). These guys are doing something similar.