| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by storus 159 days ago
	I have MacStudio with 512GB RAM, 2x DGX Spark and RTX 6000 Pro WS (planing to buy a few of those in Max-Q version next). I am wondering if we ever see local inference so "cheap" as we see it right now given RAM/SSD price trends.

2 comments

clusterhacks 159 days ago

Good grief. I'm here cautiously telling my workplace to buy a couple of dgx sparks for dev/prototyping and you have better hardware in hand than my entire org.

What kind of experiments are you doing? Did you try out exo with a dgx doing prefill and the mac doing decode?

I'm also totally interested in hearing what you have learned working with all this gear. Did you buy all this stuff out of pocket to work with?

link

storus 159 days ago

Yeah, Exo was one of the first things to do - MacStudio has a decent throughput at the level of 3080, great for token generation and Sparks have decent compute, either for prefill or for running non-LLM models that need compute (segment anything, stable diffusion etc). RTX 6000 Pro just crushes them all (it's essentially like having 4x3090 in a single GPU). I bought 2 sparks to also play with Nvidia's networking stack and learn their ecosystem though they are a bit of a mixed bag as they don't expose some Blackwell-specific features that make a difference. I bought it all to be able to run local agents (I write AI agents for living) and develop my own ideas fully. Also I was wrapping up grad studies at Stanford so they came handy for some projects there. I bought it all out of pocket but can amortize them in taxes.

link

mitjam 159 days ago

Building AI agents for a living is what I hope to become able to do, too, I consider myself still in learning phase. I have talked with some potential customers (small orgs, freelancers) and learned that local inference would unlock opportunities that have otherwise hard to tackle compliance barriers.

link

clusterhacks 159 days ago

Very cool - thanks for the info.

That you are writing AI agents for a living is fascinating to hear. We aren't even really looking at how to use agents internally yet. I think local agents are incredibly off the radar at my org despite some really good additions as supplement resources for internal apps.

What's deployment look like for your agents? You're clearly exploring a lot of different approaches . . .

link

storus 158 days ago

My commercial agents are just wrappers on top of GPT/Claude/Gemini so the standard deployment ways on Azure/AWS/GCP apply with integrations to whatever systems customers have like JIRA, Confluence etc. Some need to automate away some folks with repetitive work, some need to improve time to delivery with their people swamped by incoming work, hoping to accelerate cognitively-demanding tasks etc.

link

mitjam 159 days ago

That‘s exactly my fear.

link