|
|
|
|
|
by CYHSM
2594 days ago
|
|
Nice work! I was wondering if you noticed changes in the output coherence during training? I fine-tuned it on the corpus of The Office quotes [1] and I noticed that a loss of around 0.9 gives me the most 'humorous' outputs. This may be subjective but I think for comedy the surprise plays a huge role and for longer training (and loss around 0.4) it feels overly unsurprising and therefore less funny. I also tried sampling with temperatures >1 but then it just goes crazy (e.g. some outputs are completely in Latin). [1] https://www.reddit.com/r/MachineLearning/comments/bmn0og/p_l... |
|
I do look at the training samples but I've never noticed a worsening of 'coherence' in the samples, so to speak. I wonder if that what overfitting looks like? My PG corpus is so large that the GPT-2s struggle to converge, much less overfit, so I don't know what overfitting would look like. You could try using the new pseudo-validation loss checking feature nshepperd added to see if there's any connection between the validation loss and your perception of coherence.