| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hintymad 5 days ago
	> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations. I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

10 comments

Aurornis 5 days ago

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Enhanced it on a couple benchmarks, supposedly.

The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.

This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.

andai 5 days ago

They seem to have deleted most of the README now, but the archived version has benchmarks.

https://web.archive.org/web/20260614082641/https://huggingfa...

And the Nex benchmarks for comparison

https://huggingface.co/nex-agi/Nex-N2-Pro

Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?

monster_truck 5 days ago

I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too.

Aurornis 5 days ago

Abliterarion is a brute force technique that removes or silences parts of the model. It reduces performance because the abliterated elements aren’t perfectly isolated to censorship so other aspects suffer.

Many of the “uncensored” model providers also do some fine tuning on the models. Some of them target better benchmarks or other measures, but outside of the benchmarks and metrics they’re fine tuned for they are generally noticeably worse than the original model.

yowlingcat 5 days ago

The kind of abliteration you are mentioning is no longer state of the art or the most common form of removing the refusal layer in most models. Your your understanding was up to date about a year and a half ago, but has been out of date since after that.

weitendorf 5 days ago

Unrelated but I’ve been putting off learning about post-abliteration technique and want to use it for an upcoming open source “retraining” project I have on my backlog. I’m not interested in the refusal layers though, more like deep fine tuning but in a way that might let me prune out or consolidate layers, if that makes sense? Do you have any pointers or links to the current SOTA in this area?

I guess I’m looking for a kind of bulk/sticky dropout (which was in fashion way back when I studied DNN in school).

avadodin 5 days ago

What OP is describing wasn't called abliteration at all.

Abliteration whilst a neologism implies a surgical ablation of refusal.

Earlier approaches post–trained the model to refuse less and, much like other kinds of fine–tuning, it degraded performance. They were "uncensored".

Abliteration has seen some improvement to this day but it always was close to equivalent performance to the original when compared to those earlier techniques.

ls612 5 days ago

Nowadays it is that Heretic tool is it not? I’ve seen Gemma models uncensored with it.

tredre3 5 days ago

That is something often claimed by heretics. My experience couldn't diverge more, however. All heretic (and abliterix) models I've tried are worse than the original. It's not immediately obvious if all you do is ask 2-3 questions and marvel at how it didn't refuse, but try using them for real over longer 8k+ contexts and it falls apart real fast.

They're more prone to getting stuck in loops, becoming unresponsive, and hallucinating more (presumably because of the reduced desire to not answer).

I've tried all the popular heretic peddlers, but if you have one that you can vouch for maybe I've simply missed it.

antonvs 5 days ago

I'm curious about where you got that idea from. Neither the theory nor the available examples support it. If it did, everyone knowledgeable would be using abliterated models.

manquer 5 days ago

> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it

i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.

The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .

The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.

x312 5 days ago

This works because Nex itself is a finetune of Qwen3.5 (https://huggingface.co/nex-agi/Nex-N2-Pro). It's merging Qwen3.5 with a Qwen3.5 finetune.

I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.

oofbey 5 days ago

Correct. We used to think that because NN optimization is non-convex there are all these local minima. Now we know that once you get past the very early parts of training from random init, the loss surface is fairly smooth, and not really convex, but close enough in a bunch of ways - linear combinations of trained models are pretty much always valid combinations. You can think of fine tunings as deltas on the original model which can be summed together successfully. I think this paper first showed that to me: https://arxiv.org/pdf/1802.10026 which was 8 years ago now.

hashmap 5 days ago

not this exact thing, no, because the functional circuits dont appear in the same places across models. but if you find where they are you can do something like branch between some of the middle functional circuits between models and it kinda just works, or even do one after the other. you cant just like swap any two layers cause a bunch of em bend hyperbolic curvature to do hierarchical stuff deep in the poincare ball and the geometries get all bonkers, but before and after they do that things are relatively flat, and the geometries are more or less transferrable up to rigid rotation if they're each trained on large enough data.

woadwarrior01 5 days ago

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

kolanos 5 days ago

This team could have stopped here and still had something interesting (albeit not novel) to show. But the hype cycle was too tempting.

itkovian_ 5 days ago

This is called linear mode connectivity and seems to work for almost every large model. So well that in most cases it’s an explicit part of the training process; do many training ‘branches’ then merge then continue.

It is not understood why it works so well.

teravor 5 days ago

is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?

themafia 5 days ago

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.

tarruda 5 days ago

What I find fascinating is the idea that there might be a set of "secret" tweaks that when applied to those weights (or even smaller models) could result in an intelligence simulation that could vastly surpass even something like Fable.

kristjansson 5 days ago

https://thickets.mit.edu

moritzwarhier 5 days ago

If this is true, it really would be impressive.

Davidzheng 5 days ago

it's interesting that this was even guessed at

Davidzheng 5 days ago

ok I guess they had other clues then if you do any sort of comparison vs Nex & Qwen probably a lot of weird coincidences will show up if somehow the three weights are not linearly independent lol

meindnoch 5 days ago

It shows that LLMs are an extremely wasteful approach to intelligence.

kristjansson 5 days ago

or that intelligence is merely the composition of many redundant, lossy, ~random components

antonvs 5 days ago

Compared to what?