| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Aurornis 94 days ago
	It wasn't considered impossible. There are examples of large MoE LLMs running on small hardware all over the internet, like giant models on Raspberry Pi 5. It's just so slow that nobody pursued it seriously. It's fun to see these tricks implemented, but even on this 2025 top spec iPhone Pro the output is 100X slower than output from hosted services.

1 comments

zozbot234 94 days ago

If the bottleneck is storage bandwidth that's not "slow". It's only slow if you insist on interactive speeds, but the point of this is that you can run cheap inference in bulk on very low-end hardware.

link

Aurornis 93 days ago

> If the bottleneck is storage bandwidth that's not "slow"

It is objectively slow at around 100X slower than what most people consider usable.

The quality is also degraded severely to get that speed.

> but the point of this is that you can run cheap inference in bulk on very low-end hardware.

You always could, if you didn't care about speed or efficiency.

link

zozbot234 93 days ago

You're simply pointing out that most people who use AI today expect interactive speeds. You're right that the point here is not raw power efficiency (having to read from storage will impact energy per operation, and datacenter-scale AI hardware beats edge hardware anyway by that metric) but the ability to repurpose cheaper, lesser-scale hardware is also compelling.

link

Terretta 93 days ago

> very low-end hardware

iPhone 17 Pro outperforms AMD’s Ryzen 9 9950X per https://www.igorslab.de/en/iphone-17-pro-a19-pro-chip-uebert...

link

pinkgolem 93 days ago

In single threaded workloads, still impressive

link