| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by samus 845 days ago
	The issue with Attention essentially is that it is used to relate all token of the input sequence with each other. The need to do that somehow makes sense no matter how much one understands about the internals of a transformer. The naive way to do that boils down to matrix multiplications, and a lot more people understand the performance issues implied by them.

1 comments

whimsicalism 845 days ago

your comment makes no sense to me, sorry. if you understand attention you understand transformers, period.

link

samus 845 days ago

That's good to know :)

link

defrost 845 days ago

Likewise your comment(s) makes no sense to me.

If you can understand attention and transformers, how can you not understand that population numbers can rise, reach a peak, fall, and then level out (all w/out any genocidial actions)?

How can you claim that it is "absurdism" to imagine something that can be seen in data across the plant and animal kingdom?

link