| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pillusmany 874 days ago
	Tensor cores massively accelerate matrix multiplication with no algorithmic breakthrough. Just by being smart about how you move/cache/reuse values, operations which are considered O(1). There should be an O notation which takes into account everything - various kinds of memory latencies, speed of logic operations, .... Obviously we have wall clock or total energy used.

2 comments

david-gpu 874 days ago

> There should be an O notation which takes into account everything - various kinds of memory latencies, speed of logic operations

O notation is useful as-is because it simplifies an algorithm down to the bare bones. If such simplification doesn't capture what is important to you, you need a more complex model.

The folks designing GPUs use multiple different such models, ranging from the simple, inaccurate, understandable and fast; to the obnoxiously detailed, precise, abstruse and slow.

Not one model will ever have all the desirable properties, because they are in direct contradiction.

_kulang 874 days ago

Isn’t O notation asymptotic and a truncation? Adding an additional O(n) time to a O(n^2) algorithm would result in an O(n^2) algorithm? As far as I understand anyway. But you could make a more descriptive expression for the time based on what you know about the operations for sure