|
|
|
|
|
by pillusmany
826 days ago
|
|
Tensor cores massively accelerate matrix multiplication with no algorithmic breakthrough. Just by being smart about how you move/cache/reuse values, operations which are considered O(1). There should be an O notation which takes into account everything - various kinds of memory latencies, speed of logic operations, .... Obviously we have wall clock or total energy used. |
|
O notation is useful as-is because it simplifies an algorithm down to the bare bones. If such simplification doesn't capture what is important to you, you need a more complex model.
The folks designing GPUs use multiple different such models, ranging from the simple, inaccurate, understandable and fast; to the obnoxiously detailed, precise, abstruse and slow.
Not one model will ever have all the desirable properties, because they are in direct contradiction.