| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Our_Benefactors 16 days ago
	It runs like shit though in terms of tokens/second and still has a reduced context window. Vs a single claude prompt can easily get into 300k tokens without breaking a sweat. I want local AI to be a thing but the hardware isn’t here yet, because the only options are a Mac Studio or DGX machines strapped together. RAM prices needs to crash before local AI has a chance at actually competing.

2 comments

zozbot234 16 days ago

The more recent Chinese models are no longer heavily limited by context size. It can easily fit in RAM on a prosumer laptop. (You can also use swap space to extemd that, since context is only written to once per inference, thus a relatively mild wear-and-tear concern.)

link

Our_Benefactors 16 days ago

Claude has 1M context window for the enterprise. 128k feels like a toy in comparison.

link

sourcecodeplz 16 days ago

Deepseek pro/flash both have 1m.

link

ATMLOTTOBEER 15 days ago

You’re right, and it feels like these people saying otherwise either don’t use these tools professionally (and therefore can’t tell a difference between local/cloud models) or literally just haven’t tried running local models

As soon as I can buy hardware for less than 5k that runs an opus 4.6+/5.5 model locally I will do it instantly

link