| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alpark3 1052 days ago
	_If_ 3.5 is a MoE model, doesn't that give a lot of hope to open source movements? Once a good open source MoE model comes out, maybe even some type of variation of the decoder models available(I don't know whether MoE models have to be trained from scratch), that implies a lot more can be done with a lot less.

2 comments

152334H 1052 days ago

I agree, and really hope that Meta is doing something in that vein. Reducing the FLOPs:Memory ratio (as in Soft MoE) could also open the door to CPU (or at least Apple Silicon) inference becoming more relevant.

link

osmarks 1052 days ago

It would be bad for single-consumer-GPU inference setups.

link

Me1000 1052 days ago

Not an expert (no pun intended), but MoE where each expert is actually just a LoRA adaptor on top of the base model gets me pretty excited. Since LoRA adaptors can be swapped in and out at runtime, it might be possible to get decent performance without a lot of extra memory pressure.

link

arugulum 1052 days ago

While MoE-LoRAs are exciting in themselves, they are a very different pitch from full on MoEs. If the idea behind MoEs is that you want completely separate layers to handle different parts of the input/computation, then it is unlikely that you can get away with low-rank tweaks to an existing linear layer.

link

worldsayshi 1052 days ago

Could this work well with distributed solutions like petals?

https://github.com/bigscience-workshop/petals

I don't understand how petals can work though. I thought LLMs were typically quite monolithic.

link

osmarks 1051 days ago

Petals does a layerwise split I think. You could probably run separate experts on each system. I don't think this sort of tech is very promising so I haven't looked.

link

kristianp 1051 days ago

It could be good if the relevant expert(s) can be loaded on demand after reading the prompt? If the MOE is, say 8x8b params, then you could get good speed out of a 12GB GPU, despite the model being 64 params in size. Or am I misunderstanding how this all works?

link