|
|
|
|
|
by sillysaurusx
2000 days ago
|
|
e^loss. It's a bad name for a confusing concept: Loss. (e^loss is just another way of plotting loss, after all.) Loss isn't the whole story -- the steepest slope during training often produces the worst quality language models. You want a nice, gentle downward slope. SubsimulatorGPT2 (https://reddit.com/r/subsimulatorgpt2) continued to improve in terms of human evaluation even though the loss stayed flat for over a week. |
|