|
|
|
|
|
by psb217
539 days ago
|
|
The value of synthetic data relies on having non-zero signal about which generated data is "better" or "worse". In a sense, this what reinforcement learning is about. Ie, generate some data, have that data scored by some evaluator, and then feed the data back into the model with higher weight on the better stuff and lower weight on the worse stuff. The basic loop is: (i) generate synthetic data, (ii) rate synthetic data, (iii) update model to put more probability on better data and less probability on worse data, then go back to (i). |
|