|
|
|
|
|
by varunkmohan
1274 days ago
|
|
The magical number for performance is actually memory bandwidth which is actually lower for TPUs compared to A100s. They have more aggregate compute, but it's not trivial to use that to get very low latency on a per request basis. |
|