|
|
|
|
|
by kolinko
788 days ago
|
|
Yeah, more tests are needed. I got some feedback on using KL instead of the token similarity - initial tests seem to show that it is workable (compared to Q8), but not awesomely amazing - will be working on that next week and publishing. As for treating effort+Mistral as a separate model - I wouldn't do that comparison. The model stays the same, all the weights from it are still being used, just not all of the time - we don't really lose information from the source model. |
|