| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by swalsh 831 days ago

Parameter count seems to only matter for range of skills, but these smaller models can be tuned to be more than competitive with far larger models.

I suspect the future is going to be owned by lots of smaller more specific models, possibly trained by much larger models.

These smaller models have the advantage of faster and cheaper inference.

1 comments

theLiminator 831 days ago

Probably why MoE models are so competitive now. Basically that idea within a single model.

link

CuriouslyC 831 days ago

I don't think MoE is the way forward. The bottleneck is memory, and MoE trades MORE memory consumption for lower inference times at a given performance level.

Before too long we're going to see architectures where a model decomposes a prompt into a DAG of LLM calls based on expertise, fans out sub-prompts then reconstitutes the answer from the embeddings they return.

link

elevaet 831 days ago

Please, what is an MoE model?

link

T-A 831 days ago

https://huggingface.co/blog/moe

link

orra 831 days ago

Mixture of Experts. A popular example is Mixtral.

link