|
|
|
|
|
by looobay
231 days ago
|
|
There was research on LLMs training and distillation that if two models have a similar architecture (probably the case for Xai) the "master" model will distill knowledge to the model even if its not in the distillation data. So they probably need to train a new model from scratch. (sorry i don't remember the name but there was an example with a model liking howl to showcase this) |
|