| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by luke-stanley 384 days ago
	When I read "from scratch", I assume they are doing pre-training, not just finetuning, do you have a different take? Do you mean it's normal Llama architecture they're using? I'm curious about the benchmarks!