Y
Hacker News
new
|
ask
|
show
|
jobs
by
ronyfadel
1014 days ago
It still does a much better job at translation than llama 2 70b even, at 6.7b params
1 comments
two_in_one
1014 days ago
If it's MOE that may explain why it's faster and better...
link
yumraj
1014 days ago
MOE?
link
sarthaksrinivas
1014 days ago
Mixture of Experts Model -
https://en.wikipedia.org/wiki/Mixture_of_experts
link