Hacker News new | ask | show | jobs
by kilroy123 6 hours ago
I hope to see something like this, but in a small form factor like the NVIDIA spark.

I want a super fast LLM that is Opus 4.6+, like, in ability.

3 comments

Memory bandwidth is the bottleneck in the Spark. If you replace the SoC with an optimized ASIC but keep the same 256-bit LPDDR5 the performance will be the same. You can increase performance by using wider memory but that's also more expensive.
M3 Ultra has a 1024 bit memory bus (819 GB/s) and starts at $3,999 (96GB of RAM). It can be done....
The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill.

For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.

Unfortunately Sam Altman won't be the one to deliver us at-home hardware that can run Opus-level models
I wonder what is happening with the OpenAI / Jony Ive crossover episode.
Forget about it. Datacenter class hardware is getting farther and farther from desktop use. It’s not PCIe GPUs anymore.