Hacker News new | ask | show | jobs
by runako 356 days ago
And then the model names & descriptions are virtually useless at providing any guidance.

ChatGPT lets me choose between GPT-4o ("Great for most tasks"), o3 ("Uses advanced reasoning"), o4-mini ("Fastest at advanced reasoning"), and o4-mini-high ("Great at coding and visual reasoning").

Is what I'm doing "most tasks"? How do I know when I want "advanced reasoning"? Great, I want advanced reasoning, so I should choose the faster one with the higher version number, right? etc.

3 comments

Then there's GPT-4.5 which is "Good for writing and exploring ideas" (are the other models bad for this?), and GPT-4.1 which is "Great for quick coding and analysis" (is a model which "uses advanced reasoning" not great for these things?)
Can you describe your task and then ask ChatGPT which model you should use?
This presents the same problem, since none of the models are indicated to be best at choosing the model to use for a task.
Try different ones out and learn which works best for what type of work?
Without getting too much into semantics, I would suspect that most individuals would have trouble classifying their "type of work" against an opaque set of "type of work" classifiers buried in a model.
He was suggesting that you try different models for the same thing and see which output you like best. It's tedious but at least you get an answer.
Can't you just run a few examples by hand to see how they perform for your tasks, before committing to any for production?
> before committing to any for production

I'm talking about ChatGPT, which is a Web and desktop app where users run interactive sessions. What does "production" mean in this sense?

It’s simple - practice using them instead of complaining. Maybe you’ll figure out the differences on your own.
As a person who uses LLMs daily, I do in fact do this. Couple problems with this approach:

- there are billions of people who are not accustomed to using software this way, who are in the expected target market for this software. Most people cannot tell you the major version number of their mobile OS.

- this approach requires each individual to routinely perform experiments with the expanding firmament of models and versions. This is obviously user-hostile.

Anyway, my hot take here is that making things easier for users is better. I understand that is controversial on this site.

Imagine if this is what people suggested when I asked what kind of screwdriver I should use for a given screw, because they're all labelled, like, "Phillips. Phillips 2.0. Phillips.2.second. Phillips.2.second.version 2.0. Phillips Head Screwdriver. Phillips.2.The.Second.Version. Phillips.2.the.second.Version 2.0"
I think I misunderstood what people were talking about. Somehow I thought it was about their APIs, for specific uses in other apps.
To their credit, they did get this part correct. "ChatGPT" is the user-facing apps. The models have terrible names that do not include "ChatGPT".

Anthropic, by contrast, uses the same name for the user-facing app and the models. This is confusing, because the user-facing apps have capabilities not native to the models themselves.