| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by coreyp_1 50 days ago
	With 128 GB strix halo, you can't do as big of a model as you would think. You can do larger than having a single graphics card, of course, but that 128 gigs cannot all be dedicated to the model. Remember, the context alone is usually larger than the model itself. I got an EVO X2, and I don't regret it, but by my current calculations, it will take 8 years to recoup the cost, as opposed to just using equivalent, paid commercial options.

2 comments

smilekzs 50 days ago

A key consideration in favor of running your local LLM despite all the trouble: The commercial serving endpoint may not exist tomorrow, or at least not at the same price.

link

hedora 50 days ago

My current rule of thumb is 1GB gets you 1B parameters with a big context. (Qwen 32B fits in 32GB with 200K+ contexts)

That’s with heavy compression of the weights and the context, of course.

I haven’t gone through model evaluation + shoehorning at 128GiB yet.

link