|
|
|
|
|
by DeathArrow
85 days ago
|
|
TLDR: >For perspective: a consumer NVIDIA RTX 4060 Ti (~$400) can run comparable 3B active-parameter MoE workloads at 70–90 tok/s with 100K+ context, depending on setup. The Pocket Lab lands around 6–12 tok/s at 8K–32K context. >Same class of workload. Roughly 5–10× slower, at 3× the price, with tighter constraints. |
|