| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gfosco 55 days ago
	I set this up today on my 5090 at Q6_K quantization and Q4_0 KV, got 50 tokens/s consistently at 123k context, using ~28/32gb vram through LM Studio.

1 comments

Wow, that sounds usable. I know it's anecdotal but how did you find the quality of the output, and can you compare it to any closed source model?