Indeed. I think part of the reason when they are not discussed openly may be that much of the data used is copyrighted, which introduces some legal ambiguities.
IANAL but hiding something doesn't make someone legally immune. Any company could sue LLM companies and they can't hide it during the case. e.g. there is already a similar case on OpenAI.
Yes, but it at the very least delays any findings while you rake in the cash and try to create a favorable environment. OpenAI even stated that think using copyrighted texts is necessary and should be covered by fair use.