That's not a great argument for it being impressive, though, which is the original remark in this thread.
A car that unpredictably explodes and kills all its occupants, but which can be made safe by installing a widget, is not impressive (or at least, not in a good way).
It's verbose because it was finetuned to be "helpful" (the way OpenAI sees it). You can fix it with a system prompt, or finetune the base model with the format you want. Same with grounding it with RAG. Sure if you take a vanilla LLM and make no effort to adapt it to your app's needs, it's going to have subpar results. At least, verbosity is not an inherent problem of LLMs, it's a specific issue of a specific finetune. Hallucinations are a real problem indeed. However, being wrong and being very confident about your answer is something LLMs share with humans. LLMs can already have value if they're wrong less often than the average human (and some benchmarks suggest so).
A car that unpredictably explodes and kills all its occupants, but which can be made safe by installing a widget, is not impressive (or at least, not in a good way).