Y
Hacker News
new
|
ask
|
show
|
jobs
by
zzzoom
22 days ago
Prefill (GEMM) is compute bound, decode (GEMV) is memory bound.
1 comments
Const-me
22 days ago
> decode (GEMV) is memory bound
Decode with batch size 1 is GEMV. Batching makes the decode GEMM too.
link
Decode with batch size 1 is GEMV. Batching makes the decode GEMM too.