> 4.5/o3 doesn't seem hugely more intelligent then 3.0 -- it hallucinates less [...]
This is not entirely true, or at least the trend is not necessarily less hallucination. See section 3.3 in the OpenAI o3 and o4-mini System Card[1], which shows that o3 and o4-mini both hallucinate more than o1. See also [2] for more data on hallucinations.