| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by LittleTimothy 567 days ago
	As far as I understand it, only kind of? It's open source, but in their paper they did a tonne of pre-training and whilst they've released a small pre-training checkpoint they haven't released the results of the pre-training they've done for their paper. So anyone reproducing this will innevitably be accused of failing to pretrain the model correctly?

1 comments

wholehog 562 days ago

I think the pre-trained checkpoint uses the same 20 TPU blocks as the original paper, but it probably isn't the exact-same checkpoint, as the paper itself is from 2020/2021.

link