Hacker News new | ask | show | jobs
by colinng 920 days ago
Ditto on this. I want to not buy an A100 for $20k, or even consumer GPUs, but the truth is that for LLM inference, to run large models like LLaMa2 70b with INT4 quantization so it could fit

A100: 1248 TOPS

MI250: 362.1 TOPS

M3 Max: 18 TOPS

Yes, 18. Unless Apple has accelerated INT4 workloads but just forgot to document it.

Honestly, I’m an Apple fan, but when they go on stage and say “AI” they mean it can do speech recognition or tell a dog apart from a cat, or autofocus a camera. It can’t run ChatGPT-like things by a loooong mile.