Hacker News new | ask | show | jobs
by mmusc 819 days ago
What's the goal of a self rewarding llm?
1 comments

The goal is to iteratively create training data and add it to its own training set. The LLM acts as its own judge and scores its own responses to decide if it should add the data. It’s expensive to have a human in the loop labeling preferences, so the folks at Meta showed you can have a clever prompt and fine tune the model to judge its own responses.