|
|
|
|
|
by whiplash451
20 days ago
|
|
Maybe because distilling small models from bigger ones that you control gives you better small models than fine-tuning from bigger models you don't control? (I am not claiming it is the case, but stating this as an assumption) |
|