| HN Mirror

I have. I don't propose some kind of scientific measure but I do have two data points to contribute:

First, I've been using GPT to build an application for work for the past few months and anything but GPT-4 consistently produces less consistent and reliable output. Things like occasionally producing malformed JSON.

Second, I have a set of questions I use to evaluate models testing different capabilities and GPT-4 does much better than other models, particularly at coding tasks. There are some exceptions, for example, Bard has been able to do better on stating facts sometimes and Claude has done better at summarizing long text.

I'd love to have another model as good as GPT-4 to use but I haven't found one yet.