You seem stuck on this idea of intelligence, when you don't need "reasoning" to do any of this. You don't even need the LLM. You could just use object recognition and current robotics techniques. The LLM could just be the language interface for giving it instructions of where to patrol, what to observe.
In general you should be very skeptical of these types of videos. Most of them end up being misleadingly edited or almost outright faked. There's 3-4 cuts just in this 2 minute video alone.
When assessing if the company is worth investing in, assume it's faked.
When assessing threat models, assume it's real.
The particular problem that we have is too many robot companies are putting out videos of robots doing things they have not been able to do in the past. Moreso, even yourself, you can feed data into multimodal models and get instructions to perform real world actions.
All that said, with your post history I'll have to assume you assume everything is fake and I'd more luck explaining it to an LLM.