Hacker News new | ask | show | jobs
by amelius 1976 days ago
TL;DR:

> Unfortunately, ALGPT-2 doesn’t perform as well as GPT-2 (ALGPT-2 gets 313131 ppl on OpenWebText compared to 212121 ppl for my pretrained GPT-2 model), but I’m writing this series of blog posts to go through everything I’ve learned over the last few months.

1 comments

the way he describes the process he went through is still super helpful