Y
Hacker News
new
|
ask
|
show
|
jobs
by
mongrelion
95 days ago
Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!