|
|
|
|
|
by colinng
920 days ago
|
|
Ditto on this. I want to not buy an A100 for $20k, or even consumer GPUs, but the truth is that for LLM inference, to run large models like LLaMa2 70b with INT4 quantization so it could fit A100: 1248 TOPS MI250: 362.1 TOPS M3 Max: 18 TOPS Yes, 18. Unless Apple has accelerated INT4 workloads but just forgot to document it. Honestly, I’m an Apple fan, but when they go on stage and say “AI” they mean it can do speech recognition or tell a dog apart from a cat, or autofocus a camera. It can’t run ChatGPT-like things by a loooong mile. |
|