Hacker News new | ask | show | jobs
by kristianp 1014 days ago
Could this be used as a source of speculative tokens for larger llama models?, as per https://github.com/ggerganov/llama.cpp/pull/2926

Also, when are we going to start seeing open weights MOE models being released?

1 comments

1- yes, Gorgie twetted he is looking into it[0].

2- The only 2 i know of are airoboros[1] and Hydra which is still in progress.

[0] https://x.com/ggerganov/status/1698667093711880687?s=46&t=Jp...

[1] https://github.com/jondurbin/airoboros#lmoe

Thanks. Yes, I've seen airoboros, it aims to use a mixture of fine-tunes of the base model if I recall correctly. Not a truly pre-trained MOE, but could be useful.

Hydra, is this it? https://github.com/SkunkworksAI/hydra-moe

Yes, it's fine-tuned models, hopefully the community find use-cases where it will shine. Regarding Hydra, yes, that's the one. To stay updated, join the Discord mentioned in the repo.