I've been evaluating OpenAI (GPT-3.5 and -4o) for a project at my company and I have seen differences in quality between the api and the web version as well. Same prompts. The api version seems to vary a lot more in quality.
It's not just that. Following instructions, text generation, etc. all seem to be worse with the api version. At least, when it comes to 3.5 - 4o is much better. Ironically, 4o had problems with correct json output where 3.5 hadn't. It produced ```json {...}``` instead of just the json object.
Yeah, I don’t know why their API is inferior to the UI. Pretty disappointing. I’ve had better luck with my own “clone” that makes a web search and summarizes the results into an answer.