|
|
|
|
|
by kkielhofner
621 days ago
|
|
CPU, yes, but more importantly memory bandwidth. An RTX 3090 (as one example) has nearly 1TB/s of memory bandwidth. You'd need at least 12 channels of the fastest proof-of-concept DDR5 on the planet to equal that. If you have a discrete GPU, use an implementation that utilizes it because it's a completely different story. Apple Silicon boasts impressive numbers on LLM inference because it has a unified CPU-GPU high-bandwidth (400GB/s IIRC) memory architecture. |
|