|
|
|
|
|
by hintymad
5 days ago
|
|
> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations. I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it. |
|
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.