| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rfoo 708 days ago

No. Think of it like a different algorithm. You just take the shape of the hardware into consideration when designing the algorithm instead of considering math only.

> Seems like TVM

Fair enough, though technically they are still about different things but it's indeed very close, but

> and tinygrad

?????? what gives you this impression?

2 comments

dauertewigkeit 708 days ago

What's the distinction between what TVM does and FlashAttention type optimizations?

link

rfoo 708 days ago

There is more than layout / tile schedule in FA. For example, first, to be able to fuse all these together [0] at all, you need to "decompose" the softmax to make it combinable, which requires maintaining some extra statistics. Won't gonna repeat the math here as the original FA paper is already very clear.

[0] so you can avoid materializing intermediate matrices and still being able to compute in blocks.

link

FL33TW00D 707 days ago

Geo has explicitly stated he wants to be able to find FA in the search space of algos eventually. Actually achieving that is another matter.

link