| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sgt101 531 days ago
	Well, Deepseek trained them?

2 comments

yk 531 days ago

Yes, but it would've been nice to call them D1-something, instead of constantly having to switch back and forth between Deepseek R1 (here I mean the 604B model) as distinguished from Deepseek R1 (the reasoning model and it's distillates.)

link

rafaelmn 531 days ago

You can say R1-604b to disambiguate, just like we have llama 3 8b/70b etc.

link

pythux 531 days ago

These models are not of the same nature either. Their training was done in a different way. A uniform naming (even with explicit number of parameters) would still be misleading.

link

mdp2021 531 days ago

? Alexander is not Aristotle?!

link

sgt101 530 days ago

you made my day!

link