Hacker News new | ask | show | jobs
by treprinum 929 days ago
I would say so based on LLaMA 2 70B; if it's 8x inference in MoE then I guess you'd see <20 tokens/sec?