| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kilroy123 6 hours ago
	I hope to see something like this, but in a small form factor like the NVIDIA spark. I want a super fast LLM that is Opus 4.6+, like, in ability.

3 comments

wmf 4 hours ago

Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.

link

phonon 3 hours ago

M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....

link

bigyabai 3 hours ago

The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.

For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.

link

smith7018 4 hours ago

Unfortunately Sam Altman won't be the one to deliver us at-home hardware that can run Opus-level models

link

blitzar 1 hour ago

I wonder what is happening with the OpenAI / Jony Ive crossover episode.

link

flyinglizard 3 hours ago

Forget about it. Datacenter class hardware is getting farther and farther from desktop use. It’s not PCIe GPUs anymore.

link