Y
Hacker News
new
|
ask
|
show
|
jobs
by
brucethemoose2
993 days ago
Shouldn't it be much less than 16GB with vLLM's 4-bit AWQ? Probably consumer GPU-ish depending on the batch size?