| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ActorNightly 175 days ago
	>A single convolution step is a local operation (only pulling from nearby pixels), whereas attention is a "global" operation. In the same way where the learned weights to generate K,Q,V matricies may have zeros (or small values) for referencing certain tokens, convolution kernels just have defined zeros.