| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rldjbpin 8 days ago

the surge of articles on using decommissioned datacentre hw to run LLMs lately, is more of a symptom of the times than their viability. back when intel had a monopoly on cpu and would refuse to give consumers more than four cores, the old xeon route was popular for a different reason.

memory is the bottleneck here (capacity, or rather speed). before you run out to set up your own, try to rather squeeze out the most of your existing hardware. if you are a lucky owner of a lot of cheap memory, you are already in luck. otherwise LM studio allows you to split memory between your gpu and system memory. avoid MoE models or even consider tensor parallelism between the onboard gpu and dedicated one before going for more hardware.

there is little to no benefit for using a specific quantization for your models, so go crazy and test out whatever can easily run for you.