Y
Hacker News
new
|
ask
|
show
|
jobs
by
gfosco
55 days ago
I set this up today on my 5090 at Q6_K quantization and Q4_0 KV, got 50 tokens/s consistently at 123k context, using ~28/32gb vram through LM Studio.
1 comments
pawelduda
55 days ago
Wow, that sounds usable. I know it's anecdotal but how did you find the quality of the output, and can you compare it to any closed source model?
link