Hacker News new | ask | show | jobs
by siilats 956 days ago
So you have a training set of questions and answers generated by humans. For simplification you ask ten people the same question and get 10 answers and feed it to llm and then in testing time ask the same question. Now say there is a correct answer. Now in training set you asked the question normally 5 times and added “this is very important” 5 times. And it turns out humans have better answers in the training set if you added the qualifier. And during testing time, when you add the qualifier you are telling the llm to put more weight on those 5 answers, and it performs better. Just like in image generation you need lots of negative prompts because the training set is so dirty.