By comparison with reality. The initial LLMs had "reality" be "a training set of text", when ChatGPT came out everyone rapidly expanded into RLFH (reinforcement learning from human feedback), and now there's vision and text models the training and feedback is grounded on a much broader aspect of reality than just text.
That's one way to do it, but overkill for this specific thing — self-driving cars or robotics, or natural use of smart-[phone|watch|glass|doorbell|fridge], likely sufficient.
Total surveillance may be necessary for other reasons, like making sure organised crime can't blackmail anyone because the state already knows it all, but it's overkill for AI.
Does the AI need to know or the curator of the dataset? If the curator took a camera and walked outside (or let a drone wander around for a while), do you believe this problem would still arise?