| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by weird-eye-issue 799 days ago

I don't want/need any of that

It's already hard enough to get consistent behavior with a fixed model

If we need to save money we will switch to a cheaper model and adapt our prompts for that

If we are going more for quality we'll use and more expensive model and adapt our prompts for that

I fail to see any use case where I would want a third party choosing which model we are using at run time...

We are adding a new model this week and I've spent dozens of hours personally evaluating output and making tweaks to make it feasible

Making it sound like models are interchangeable is harmful

1 comments

danlenton 799 days ago

Makes sense, however I would clarify that we don't need to make the final decision. If you're using the neural scoring function as an API, then you can just get predictions about how each model will likely perform on your prompt, and then use these predictions however you want (if at all). Likewise, the benchmarking platform [https://youtu.be/PO4r6ek8U6M] can be used to just assess different models on your prompts without needing to do any routing. Nonetheless, this perspective is very helpful.

link