That isn't a flaw though. Counting is orthogonal to the functioning of LLMs, which are merely completing patterns based on their training data and available context. If you want an LLM to count reliably, give it a tool.
We're still on that's just how it works. The LLM isn't aware of any consequence, etc. All it does is complete patterns as trained. And the data contains many instances of articulate question answering.
It is for those using the LLM to be aware of its capabilities, or not - be allowed to - use it. Like a child unaware that running their finger on a sharp knife blade will lead to a bad slice; you don't dull the blade to keep the child safe, but keep the child from the knife until they can understand and respect its capabilities.
LLMs deliver pretty well on their intended functionality: they predict next tokens given a token history and patterns in their training data. If you want to describe that as fully intelligent, that's your call, but I personally wouldn't. And adding functionality that isn't directly related to improving token prediction is just bad practice in an already very complex creation. LLM tools exist for that reason: they're the handles, sheaths, sharpeners, etc for the knife. Teach those adults who're getting themselves cut to hold the knife by the handle and use the other accessories that improve user experience.