|
|
|
|
|
by maxnevermind
1 hour ago
|
|
> synthetic verifiable traces What does it mean, Is it like when somebody used some coding agent to develop a feature and later input prompts and a resulting PR can be used for training by a presumption that final PR was a correct implementation of a prompt? |
|
The trick is to find the examples that are just in between too difficult and too easy for the existing agent, these have the strongest training signals