|
|
|
|
|
by p1esk
1341 days ago
|
|
Oh OK, so you mean training the model after it has already been trained on the main task, right? Like finetuning. Yes, I think the GAN-like finetuning is a good idea. Though it's less clear where the labels would come from, it seems like some sort of fingerprint would need to be computed for each generated sequence, and this fingerprint would need to be compared against a database of fingerprints for every sequence in the training set. This could be a huge database. |
|
Another similar possibility might be to do more RL with this data, e.g. using upside-down RL. One can possibly steer this with user feedback as well.