| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mistercheph 108 days ago
	While I believe that performance varies with respect to prompt, I have a seriously hard time believing that using the same prompt that was effective with the previous model would perform worse with the next generation of the same model from that lab and the same prompt.

1 comments

deaux 108 days ago

You shouldn't have a hard time believing it. There are thousands of different domains out there. You find it hard to believe that any of them would perform worse in your scenario?

Labs are still really optimizing for maybe 10 of those domains. At most 25 if we're being incredibly generous.

And for many domains, "worse" can hardly be benched. Think about creative writing. Think about a Burmese cooking recipe generator.

link

bethekidyouwant 107 days ago

Bruh, how do you evaluate a batch of 1000 jobs against a x model for creative writing or cooking recipes? It’s vibes all the way down. This reeks like some kind of blog spam seo nonsense.

link

deaux 107 days ago

The entire point is that you _don't_ for creative writing, vibes are the whole point, and those vibes often get worse across model updates for the same prompts.

link