The problem isn't the SQL generation per se, it's the mapping between natural language and the data that columns actually hold. If you have a clean schema with well named fields, LLMs are very good at generating valid SQL but if you have an old schema that has been patched and modified for years by a dozen DBAs each with their own naming quirks, short hands, and hacks, it can barely string a valid query together.