| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lambda 63 days ago
	I'm running an 8 bit quant right now, mostly for speed as memory bandwidth is the limiting factor and 8 bit quants generally lose very little compared to the full res, but also to save RAM. I'm still working on tweaking the settings; I'm hitting OOM fairly often right now, it turns out that the sliding window attention context is huge and llama.cpp wants to keep lots of context snapshots.

1 comments

qingcharles 63 days ago

I had a whole bunch of trouble getting Gemma 4 working properly. Mostly because there aren't many people running it yet, so there aren't many docs on how to set it up correctly.

It is a fantastic model when it works, though! Good luck :)

link