Y
Hacker News
new
|
ask
|
show
|
jobs
Layer-wise inferencing and batching: Small VRAM doesn't limit LLM throughput
(
verdagon.dev
)
5 points
by
one-punch
771 days ago