| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by memossy 890 days ago
	800m is good for mobile, 8b for graphics cards. Bigger than that is also possible, not saturated yet but need more GPUs.

2 comments

anon373839 890 days ago

Do you know how the memory demands compare to LLMs at the same number of parameters? For example, Mistral 7B quantized to 4 bits works very well on an 8GB card, though there isn’t room for long context.

link

vorticalbox 890 days ago

you ca also quantisation which lowers memory requirements at a small lose of performance.

link