| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iamtherhino 414 days ago
	We've been playing with that in the background. I can try to shoot you a preview in a few weeks. It works pretty well for reasoning tasks/NLP workloads but for workloads that need a "correct" answer, it's really tough to maintain accuracy when swapping models. What we've seen most successful is making recommendations in the agent creation process for a given tool/workload and then leaving them somewhat static after creation.

1 comments

0xDEAFBEAD 414 days ago

That's fair. Maybe you could even send the user an email if you detect a new model release or pricing change which handles their workload for cheaper at comparable quality, to notify them to investigate.

link

iamtherhino 414 days ago

That's a good idea-- then give them a link to "replay last X inferences with model ABC" so they can do a quick eyeball eval.

link

0xDEAFBEAD 414 days ago

Sweet, maybe you'll like my other idea in this thread too: https://news.ycombinator.com/item?id=43929194

link