| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 70 days ago
	This is why I'd like to see a lot more focus on batched inference with lower-end hardware. If you just do a tiny amount of tok/day and can wait for the answer to be computed overnight or so, you don't really need top-of-the-line hardware even for SOTA results.

2 comments

deaux 69 days ago

> If you just do a tiny amount of tok/day and can wait for the answer to be computed overnight or so

But they can't? The usage pattern is the polar opposite. Most people running these models locally just ask a few questions to it throughout the day. They want the answers now, or at least within a minute.

link

zozbot234 69 days ago

If you want the answer right now, that alone ups your compute needs to the point where you're probably better off just using a free hosted-AI service. Unless the prompt is trivial enough that it can be answered quickly by a tiny local model.

link

redman25 69 days ago

A strix halo machine or MAC will run at less than 20watts idle. You could leave it running.

link

mistercheese 69 days ago

That’s a good point. I think I saw Together.ai with that offering, but for some reason just never think to throw random non urgent coding tasks at it overnight

link