Hacker News new | ask | show | jobs
by ftxbro 1121 days ago
I think they have some mistake in their analysis

> "B Anticorrelation between Perplexity and Generation Quality"

> "When fine-tuning LIMA, we observe that perplexity on held-out Stack Exchange data (2,000 examples) negatively correlates with the model’s ability to produce quality responses. To quantify this manual observation, we evaluate model generations using ChatGPT, following the methodology described in Section 5. Figure 9 shows that as perplexity rises with more training steps – which is typically a negative sign that the model is overfitting – so does the quality of generations increase"

I think where they say "anticorrelation" it should say "correlation" and where they say "negatively correlates" it should say "positively correlates" if they are basing their statement on what they observed in their experiments.

EDIT: I see they say "Preprint. Under review" so maybe they will fix it if it's wrong. This is the kind of thing that peer review is really good at fixing. Also not every submission on arxiv is a preprint or under review but I guess this one is.