Hacker News new | ask | show | jobs
by p1esk 1341 days ago
Oh OK, so you mean training the model after it has already been trained on the main task, right? Like finetuning. Yes, I think the GAN-like finetuning is a good idea. Though it's less clear where the labels would come from, it seems like some sort of fingerprint would need to be computed for each generated sequence, and this fingerprint would need to be compared against a database of fingerprints for every sequence in the training set. This could be a huge database.
1 comments

You'd need something Spotify.

Another similar possibility might be to do more RL with this data, e.g. using upside-down RL. One can possibly steer this with user feedback as well.