Y
Hacker News
new
|
ask
|
show
|
jobs
by
treprinum
929 days ago
I would say so based on LLaMA 2 70B; if it's 8x inference in MoE then I guess you'd see <20 tokens/sec?