|
|
|
|
|
by mistercheph
108 days ago
|
|
While I believe that performance varies with respect to prompt, I have a seriously hard time believing that using the same prompt that was effective with the previous model would perform worse with the next generation of the same model from that lab and the same prompt. |
|
Labs are still really optimizing for maybe 10 of those domains. At most 25 if we're being incredibly generous.
And for many domains, "worse" can hardly be benched. Think about creative writing. Think about a Burmese cooking recipe generator.