| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by keonix 990 days ago
	Wait until you hear about frankenmodels. You rip parts of one model (often attention heads) and transplant them in another and somehow that produces coherent results! Witchcraft https://huggingface.co/chargoddard

1 comments

GaggiX 990 days ago

>somehow that produces coherent results

with or without finetuning? Also is there a practical motivation for creating them?

link

keonix 990 days ago

> with or without finetuning?

With, but it's still bonkers that it works so well

>Also is there a practical motivation for creating them?

You could get in-between model sizes (like 20b instead of 13b or 34b). Before better quantization it was useful for inference (if you are unlucky with vram size), but now I see this being useful only for training because you can't train on quants

link

ShamelessC 990 days ago

> With, but it's still bonkers that it works so well

Ehhhh…

link