Y
Hacker News
new
|
ask
|
show
|
jobs
by
zingelshuher
813 days ago
It was inspired by Mixtral 8x7B, of course. I think the same approach, soft to hard MoE, can be used in other domains. Like video/image processing. Would be interesting to take it to extreme, like 4 experts out of 100.