Hence why many investors don't mind that Tesla isn't generating wonderful cash flow. If there's a reasonable expectation that this data collection will lead to full autonomy, then Tesla should price their cars at whatever level is necessary to move their maximum production into consumers' hands, even if it were a loss (which it's not).
And even if they don't use the data to train, they can use it as an argument for how much better they are than humans. That's one hurdle that Waymo could have trouble with, whereas Tesla could point to tens of millions of miles where their sensors would have avoided crashes that humans failed to avoid.
For compute, during the autonomy day video they mention they use their own clusters, because they've designed their own "full self driving computer" with a custom set of neural network accelerator ASICs. So to do test runs on their own driving computers, of course, they need to build their own machine clusters.
Custom built clusters can be much cheaper than the cloud anyway.
For storage, they say they actually pull data on demand from the fleet. They have NNs that can detect similar-looking stuff to input samples, so if they want more videos of construction sites they just ask the fleet to send them more videos of construction sites. Actual storage of all video isn't required except for their test suites.
And even if they don't use the data to train, they can use it as an argument for how much better they are than humans. That's one hurdle that Waymo could have trouble with, whereas Tesla could point to tens of millions of miles where their sensors would have avoided crashes that humans failed to avoid.