I wondered if o1 would do better- seems reasonable that step-by-step trying to produce legs/torso/head/horn would do better than very weird legless things 4o is making. Looks like someone has done it: https://openaiwatch.com/?model=o1-preview
They do seem to generally have legs and head, which is an improvement over 4o. Still pretty unimpressive.
But I guess GPT-4o results are more funny to look at.