It gives you an idea of where today's models fail (Gemini Flash, OpenAI gpt4o+mini, open-source ones like Llama 3.2 Vision, Qwen VL 2.5 etc).