Really? Less RAM bw than an Epyc CPU? And 4x to 8x less than a consumer GPU?
How come this doesn’t massively limit LLM inference speeds?