| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by muppetman 357 days ago

The "pick your model" thing is so stupid.

"How dumb do you want your AI to be?"

"Why do I have to select?"

"Because smart costs money"

"So... I can have dumb AI but it's cheaper?"

"Yes"

"How would the average person know which to pick?"

"Oh you can't know."

I hope they can invent an AI that knows which AI model my question should target cheaply.

7 comments

runako 356 days ago

And then the model names & descriptions are virtually useless at providing any guidance.

ChatGPT lets me choose between GPT-4o ("Great for most tasks"), o3 ("Uses advanced reasoning"), o4-mini ("Fastest at advanced reasoning"), and o4-mini-high ("Great at coding and visual reasoning").

Is what I'm doing "most tasks"? How do I know when I want "advanced reasoning"? Great, I want advanced reasoning, so I should choose the faster one with the higher version number, right? etc.

antonkochubey 356 days ago

Then there's GPT-4.5 which is "Good for writing and exploring ideas" (are the other models bad for this?), and GPT-4.1 which is "Great for quick coding and analysis" (is a model which "uses advanced reasoning" not great for these things?)

tzs 356 days ago

Can you describe your task and then ask ChatGPT which model you should use?

runako 356 days ago

This presents the same problem, since none of the models are indicated to be best at choosing the model to use for a task.

hluska 356 days ago

Try different ones out and learn which works best for what type of work?

runako 356 days ago

Without getting too much into semantics, I would suspect that most individuals would have trouble classifying their "type of work" against an opaque set of "type of work" classifiers buried in a model.

elbear 356 days ago

He was suggesting that you try different models for the same thing and see which output you like best. It's tedious but at least you get an answer.

dataflow 356 days ago

Can't you just run a few examples by hand to see how they perform for your tasks, before committing to any for production?

runako 356 days ago

> before committing to any for production

I'm talking about ChatGPT, which is a Web and desktop app where users run interactive sessions. What does "production" mean in this sense?

hluska 356 days ago

It’s simple - practice using them instead of complaining. Maybe you’ll figure out the differences on your own.

runako 356 days ago

As a person who uses LLMs daily, I do in fact do this. Couple problems with this approach:

- there are billions of people who are not accustomed to using software this way, who are in the expected target market for this software. Most people cannot tell you the major version number of their mobile OS.

- this approach requires each individual to routinely perform experiments with the expanding firmament of models and versions. This is obviously user-hostile.

Anyway, my hot take here is that making things easier for users is better. I understand that is controversial on this site.

BobaFloutist 356 days ago

Imagine if this is what people suggested when I asked what kind of screwdriver I should use for a given screw, because they're all labelled, like, "Phillips. Phillips 2.0. Phillips.2.second. Phillips.2.second.version 2.0. Phillips Head Screwdriver. Phillips.2.The.Second.Version. Phillips.2.the.second.Version 2.0"

dataflow 356 days ago

I think I misunderstood what people were talking about. Somehow I thought it was about their APIs, for specific uses in other apps.

runako 356 days ago

To their credit, they did get this part correct. "ChatGPT" is the user-facing apps. The models have terrible names that do not include "ChatGPT".

Anthropic, by contrast, uses the same name for the user-facing app and the models. This is confusing, because the user-facing apps have capabilities not native to the models themselves.

HappMacDonald 357 days ago

You bring up the important point that for a company who earns money off of tokens wasted, a confusing selection of models can translate into extra spend to experiment with tweaking them.

Some users may not appreciate that, but many more might be drawn to the "adjust the color balance on the TV" vibes.

setopt 357 days ago

> I hope they can invent an AI that knows which AI model my question should target cheaply.

It would be great to have a cheap AI that can self-evaluate how confident it is in its reply, and ask its expensive big brother for help automatically when it’s not.

taikahessu 357 days ago

That would actually be the AGI we are waiting for, since we - as humans, in surprisingly big portion of all cases - don't know how or can't seem to do that either!

setopt 356 days ago

On the other hand, ChatGPT seems to be getting better at knowing when it should Google something for me rather than hallucinate something.

Shouldn’t asking a more expensive model for input be a similar level of «tool use»?

reilly3000 357 days ago

I think you make a good point. Cursor is doing a basic “auto” model selection feature and it could probably get smarter, but to gauge the complexity of the response you might need to run it first. You could brute force it with telemetry and caching if you can trust the way you measure success.

bastard_op 356 days ago

I usually feel with chatgpt picking a model is like "Which of the three stooges would you like to talk to, curly, larry, or moe (or worse, curly joe)?" I usually only end up using o3 because gpt-40 is just that bad, so why would I ever want to talk to a lesser stooge?

If paying by API use it probably makes more sense to talk to a lesser stooge where possible, but for a standard pro plan I just find the lesser models aren't worth the time to use in frustration they cause.

prepend 357 days ago

I imagine that we need a bootstrap ai to help you optimize the right ai for each task.

I don’t think I’d trust the vendor’s ai to optimize when they will likely bias toward revenue. So a good case for a local ai that only has my best interests at heart.

Currently, the guidance from vendors is “try it and see which yields the best results” which is kind of like “buy this book, read it, and see if you like it” and how of course the publisher wants you to take this action because they get their money.

addandsubtract 357 days ago

> I hope they can invent an AI that knows which AI model my question should target cheaply.

Isn't that the idea of OpenRouter?

oersted 357 days ago

Not exactly, but yeah. OpenRouter is a unified API, directory and billing system for LLM providers.

I think you are getting confused by the term "Model Routing", which to be fair OpenRouter does support, but it's a secondary feature and it's not their business focus. Actually OpenRouter is more focused on helping you choose the best provider for a specific open model based on their history of price, speed, reliability, privacy...

The model routing is simply provided by NotDiamond.ai, there are a number of other startups in this space.

https://openrouter.ai/docs/features/model-routing