Hacker News new | ask | show | jobs
by iamtherhino 414 days ago
We've been playing with that in the background. I can try to shoot you a preview in a few weeks. It works pretty well for reasoning tasks/NLP workloads but for workloads that need a "correct" answer, it's really tough to maintain accuracy when swapping models.

What we've seen most successful is making recommendations in the agent creation process for a given tool/workload and then leaving them somewhat static after creation.

1 comments

That's fair. Maybe you could even send the user an email if you detect a new model release or pricing change which handles their workload for cheaper at comparable quality, to notify them to investigate.
That's a good idea-- then give them a link to "replay last X inferences with model ABC" so they can do a quick eyeball eval.
Sweet, maybe you'll like my other idea in this thread too: https://news.ycombinator.com/item?id=43929194