Hacker News new | ask | show | jobs
by fxtentacle 242 days ago
„273 GB/sec memory bandwidth“

Really? Less RAM bw than an Epyc CPU? And 4x to 8x less than a consumer GPU?

How come this doesn’t massively limit LLM inference speeds?

1 comments

It does - the inference speed is much slower than a consumer video card. The draw for the Spark and systems like it are the massive amounts of memory available to the GPU.