|
|
|
|
|
by numpad0
333 days ago
|
|
PSA: models confusingly named "$1-distill-$2"(sometimes without "-distill") are $2 trained on outputs of $1, referred to as "distillation" process, not the other way around nor the real thing. The article contains nonexistent configurations such as "Deepseek-R1 1.5B", those are that thing. |
|