Hacker News new | ask | show | jobs
by whimsicalism 846 days ago
How are you not familiar with transformers yet have seen multiple explanations of FlashAttention?
2 comments

The issue with Attention essentially is that it is used to relate all token of the input sequence with each other. The need to do that somehow makes sense no matter how much one understands about the internals of a transformer. The naive way to do that boils down to matrix multiplications, and a lot more people understand the performance issues implied by them.
your comment makes no sense to me, sorry. if you understand attention you understand transformers, period.
That's good to know :)
Likewise your comment(s) makes no sense to me.

If you can understand attention and transformers, how can you not understand that population numbers can rise, reach a peak, fall, and then level out (all w/out any genocidial actions)?

How can you claim that it is "absurdism" to imagine something that can be seen in data across the plant and animal kingdom?

Literally the exact question I had reading that comment haha