| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by numpad0 333 days ago
	PSA: models confusingly named "$1-distill-$2"(sometimes without "-distill") are $2 trained on outputs of $1, referred to as "distillation" process, not the other way around nor the real thing. The article contains nonexistent configurations such as "Deepseek-R1 1.5B", those are that thing.