Which presentation did you watch? Karpathy said specifically "it's not a massive amount of data, it's just very well picked data" when talking about how the cars only send data when one of the configured triggers fires.
There’s a large gap between ‘a few frames’ and a massive amount of data, and the amount sent lies somewhere in the middle. Clearly they can’t send all data (nor would they want to) but it seems it is sufficient for significant learning to take place and the examples shown were good quality over at least a few seconds, so hundreds of frames for each example.