Y
Hacker News
new
|
ask
|
show
|
jobs
by
Der_Einzige
1062 days ago
Any data on inference speed? I’ve found that the non quantized model was much faster on GPU than the quantized versions due to lower GPU utilization.
1 comments
api
1062 days ago
It's a RAM tradeoff. If you have enough GPU RAM to load the non-quantized model it may be faster.
link