Hacker News new | ask | show | jobs
by zzzoom 22 days ago
Prefill (GEMM) is compute bound, decode (GEMV) is memory bound.
1 comments

> decode (GEMV) is memory bound

Decode with batch size 1 is GEMV. Batching makes the decode GEMM too.