Hacker News new | ask | show | jobs
by all2 84 days ago
I've been watching the drizzle of LLM papers come through, and I think we're going to hit a 1T param MoE on consumer hardware before this year is out. It'll still be behind the bigco models, but it'll be a force multiplier. Ideally, we'd get these models to run on a CPU. MS BitNet is one way to do this. You can already run ternary LLMs on consumer CPUs with a decent tps.
2 comments

Though what is consumer hardware right now?

Can we still classify 5090s as consumer hardware given how expensive they are? They're £3k at the moment, and it looks like it's only going to get worse unless the AI bubble pops.

I got an Olares One system with a 24GB (consumer not 32GB) NVIDIA RTX 5090 for less than $3k at the Kickstarter price. It comes with Olares OS which for my purposes is not all that useful, I settled finally on a good Ubuntu 24.04 LTS configuration, but it was a good deal. I actually bought two.
I was thinking more in terms of 24GB of VRAM total. I started sketching the architecture for such a model this afternoon, nothing novel, just combining existing advancements in the field. It looks achievable.
I mean you can run a 1T model on consumer hardware now by doing things like layer offloading and streaming from SSD. It's just too slow to be useful.