| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Gracana 694 days ago
	That's some really strange behavior, I don't know why that would cause poor results rather than just poor performance. Can you configure the context size with `/set parameter num_ctx N`? On my laptop with an RTX A3000 12GB I can run `yi-coder:9b-chat` (Q4_0) with 32768 context and it produces good results quickly. That uses 11GB of VRAM so it's maxed out for this setup.

1 comments

Solved, see:

Works very well now! 65K input tokens with 8192 output tokens is no longer an issue on my 4090. (It maxes out on 22GB/VRAM)

Awesome! Glad to hear you got it sorted out.