|
|
|
|
|
by kippinitreal
1069 days ago
|
|
Clever idea. I think you would have to recompute the context (ie embed the prior tokens) every time you swapped models because the weight distributions would be different for each model. Going from big->small might make this overhead worth it, but going back from small->big would assuredly be very costly. |
|