Hacker News new | ask | show | jobs
by Eridrus 1805 days ago
Being able to tell if a model has been trained enough without reference to a separate dev set seems like a useful capability, but how can you actually turn these plots into a decision criteria?

Why is a modal alpha of 4 high, but an alpha of 3.5 ok?

1 comments

Great question. 4 is at the high edge of the fat tailed universality class. Most high performing models have alpha approaching 2, or at least below 3. See Figure 8(a) in the Nature paper, and our upcoming JMLR paper https://arxiv.org/abs/1810.01075