|
|
|
|
|
by janalsncm
804 days ago
|
|
Translation: you don’t need to serve 96 layer transformers for ranking and recommendation. You’re probably using a neural net with around 10-20 million parameters. But it needs to be fast and highly parallelizable, and perhaps perform well in lower precisions like f16. And it would be great to have a very large vector LUT on the same chip. |
|
Meta seems to be reported these numbers for this v2 chip:
And I see Nvidia reporting these numbers for its latest Blackwell chips https://www.anandtech.com/show/21310/nvidia-blackwell-archit... Am I understanding correctly that Nvidia's upcoming Blackwell chips are 5-10x faster than this one Meta just announced?