| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zingelshuher 859 days ago
	Similar MoE implementation was on GitHub for a while, since Jan 2024 https://github.com/zxaall/moegpt

1 comments

avisoori1x 859 days ago

Oh nice. What's new here would be noisy top-k routing and expert capacity. It also seems to use the nanoGPT base from Andrej Karpathy. Mine is from January as well. Here's the original blog: https://huggingface.co/blog/AviSoori1x/makemoe-from-scratch

link

zingelshuher 859 days ago

It was inspired by Mixtral 8x7B, of course. I think the same approach, soft to hard MoE, can be used in other domains. Like video/image processing. Would be interesting to take it to extreme, like 4 experts out of 100.

link