Hacker News new | ask | show | jobs
by DeathArrow 85 days ago
TLDR:

>For perspective: a consumer NVIDIA RTX 4060 Ti (~$400) can run comparable 3B active-parameter MoE workloads at 70–90 tok/s with 100K+ context, depending on setup. The Pocket Lab lands around 6–12 tok/s at 8K–32K context.

>Same class of workload. Roughly 5–10× slower, at 3× the price, with tighter constraints.