Hacker News new | ask | show | jobs
by seniorsassycat 246 days ago
I'm curious what effects the system prompt has

- randomize a and b, maybe there's a preference for answering a, or first option. - how do references to training data or roles affect the responses?

Limiting the response to a/b/pass makes sense to measure the results, but feels like it could affect the results. What would we see with a full response then a judgement pass