Hacker News new | ask | show | jobs
by api 1065 days ago
It's a RAM tradeoff. If you have enough GPU RAM to load the non-quantized model it may be faster.