Y
Hacker News
new
|
ask
|
show
|
jobs
by
api
1065 days ago
It's a RAM tradeoff. If you have enough GPU RAM to load the non-quantized model it may be faster.