|
|
|
|
|
by brofallon
1166 days ago
|
|
To use RLHF you need a dataset that includes instructions with good & bad answers - do many of those exist? I know there are a few datasets of just plain instructions-with-responses, but I'm not aware of any that have both good and bad (or ranked) responses. Is that trivial, or an important missing element here? |
|