Hacker News new | ask | show | jobs
by wholehog 552 days ago
I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.