There's evidence various third-party models (including Deepseek) used distilling in training, based on models from those leading services. So they have more flexibility with pricing.
The point was that distilling based on others' models for training means they're not spending the same amount on R&D and/or training, giving them headroom in other ways (responding to the parent's point). It wasn't a comment reflecting on copyright/fair use.