| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by irthomasthomas 28 days ago
	Given that 4.7 was a brand new model, trained from scratch with a unique architecture and tokenization scheme, I don't see the same pattern. It seems arbitrary.

1 comments

dominotw 28 days ago

i dont understand the nuances here. what does this mean. 4.8 is trained on same model as previous one then? what does brand new mean.

link

irthomasthomas 28 days ago

It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8.

link

dominotw 28 days ago

do you mean pre training? so 4.8 is just post training of an old pretrained model?

btw where do they tell you how they trained the model.

link