I take building the first AGI to be in the same category.
If we were to get AGI in a few years, the ends would absolutely justify the means.
Is all that holds back AGI the volume of data?
If so, how much data is needed?
But giving an LLM loads of data might turn out to have been a necessary condition on the road to developing AGI.
That's quite the leap. Some don't even think AGI is possible. And some of those that do, don't think LLMs are on the path.
Even if we assume it is, there is a significant amount of non-copyrighted text available to train with.
The difference being ChatGPT needs text that provides value in the ChatGPT product for the general audience.