Hacker News new | ask | show | jobs
by zingelshuher 813 days ago
It was inspired by Mixtral 8x7B, of course. I think the same approach, soft to hard MoE, can be used in other domains. Like video/image processing. Would be interesting to take it to extreme, like 4 experts out of 100.