Hacker News new | ask | show | jobs
by aurareturn 299 days ago
Again, prompt processing isn't the major problem here. It's bandwidth. 256GB/s bandwidth (maybe ~210 in real world) limits the tokens per second well before prompt processing.

Not entirely sure how your ARM statement matters here. This is unified memory.