Hacker News new | ask | show | jobs
by bwhitty 3 days ago
As another poster above linked, it’s been shown to be effective since 2022: https://arxiv.org/abs/2203.05482
1 comments

it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.