Hacker News new | ask | show | jobs
by deaux 105 days ago
You shouldn't have a hard time believing it. There are thousands of different domains out there. You find it hard to believe that any of them would perform worse in your scenario?

Labs are still really optimizing for maybe 10 of those domains. At most 25 if we're being incredibly generous.

And for many domains, "worse" can hardly be benched. Think about creative writing. Think about a Burmese cooking recipe generator.

1 comments

Bruh, how do you evaluate a batch of 1000 jobs against a x model for creative writing or cooking recipes? It’s vibes all the way down. This reeks like some kind of blog spam seo nonsense.
The entire point is that you _don't_ for creative writing, vibes are the whole point, and those vibes often get worse across model updates for the same prompts.