Hacker News new | ask | show | jobs
by samus 845 days ago
The issue with Attention essentially is that it is used to relate all token of the input sequence with each other. The need to do that somehow makes sense no matter how much one understands about the internals of a transformer. The naive way to do that boils down to matrix multiplications, and a lot more people understand the performance issues implied by them.
1 comments

your comment makes no sense to me, sorry. if you understand attention you understand transformers, period.
That's good to know :)
Likewise your comment(s) makes no sense to me.

If you can understand attention and transformers, how can you not understand that population numbers can rise, reach a peak, fall, and then level out (all w/out any genocidial actions)?

How can you claim that it is "absurdism" to imagine something that can be seen in data across the plant and animal kingdom?