Hacker News new | ask | show | jobs
by brucethemoose2 993 days ago
Shouldn't it be much less than 16GB with vLLM's 4-bit AWQ? Probably consumer GPU-ish depending on the batch size?