It is unfortunate that they can't think for themselves during the training process itself. The think-mode might help in training too if used correctly.
They're not trained on a raw feed of the internet. They are given curated and synthetic data. The curation and synthesis of new data is done by existing LLMs.
Even if you're given the perfect textbook to read, it still helps you to take notes. Notes serve multiple purposes -- they help add clarity where it is needed, and more importantly, they help integrate new info (the current batch) with prior info (previous batches).
Huh. The tech is what you make it. With your limiting logic, you would've said the same thing for thinking models at inference time too. There is nothing logically, mathematically, or physically prohibiting using thinking at training time too.