| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alexjplant 9 days ago
	I've spent the past week trying to scheme a way to get affordable local inference of something useful (Qwen3.6-36B-A3B) for ~$500 and have come to the conclusion that it simply isn't viable. A pair of power-restricted P100s in a workstation gets close but the workstations themselves are expensive and rare as hen's teeth (not to mention loud and large). I think early '27 will be when things open up as the hardware market unclenches and further strides are made in small capable models.

1 comments

mappu 9 days ago

I'm running Qwen3.6-35B-A3B on a very ordinary desktop PC (32GB DDR5, 8GB Radeon 6600XT) and getting a useful 15-20 tok/sec out of it. The MoE architecture and auto offloading from system to VRAM is just fantastic. Unsloth Q4_K_XL.

The Qwen3.6-27B is unbearably slow as it doesn't fit in VRAM, though, i think the MoE is very easy to run.

It is also extremely nice that you can just `apt install llama.cpp libggml0-backend-vulkan` now too.

link

ozim 8 days ago

I wonder what parent poster means with „useful” and what he actually tried? Feels like he was just comparing some benchmarks.

Yesterday I downloaded Gemma4-26B with Ollama on quite rusty desktop with 1070 8gb and 32gb of ram and Core i5-9400.

I drop photo of my water meter and tell it to read the value and serial number. It was far from instant but it was also easily under 3 minutes and result was correct.

Earlier like in February I was trying the same photo with Gemma3 on the same hardware and results were bad.

link

alexjplant 8 days ago

> I drop photo of my water meter and tell it to read the value and serial number. It was far from instant but it was also easily under 3 minutes and result was correct.

"Useful" as in "has a use that isn't just for show". It takes me two seconds to read a photo of a water meter. Having an LLM read it for me in 3 minutes isn't useful. Similarly small models are capable of tool use (e.g. web searches) but their synthesis leaves much to be desired. As an example I'd ask some small models to find examples of products with specific characteristics and they'd come back with only one or two because they discounted other possibilities incorrectly by reasoning themselves out of it.

> Feels like he was just comparing some benchmarks.

On what do you base this assertion?

link

ozim 8 days ago

trying to scheme a way

Mostly use of this expression.

I don’t get agent to read the meter for me - I can do that when I take the photo.

I send the photo to a bot that ingests photos from me and stores readings for me with date and time so later I can ask „what was last reading” or what was the usage between x and y dates”, without me having to make a perfect photo, without me having to dabble with OpenCV.

Even if it takes 30mins it is still useful for me.

link