Hacker News new | ask | show | jobs
by PoignardAzur 1389 days ago
> reduce the size of the patch(num_of_pixel x num_of_pixel) with a linear projection

What does that mean?

(Thanks for the explanation)

1 comments

The flattened image patch of width and height PxP pixels gets multiplied with a learnable matrix of dimension P^2xD where D is the size of the patch embedding. In other words, it’s a linear transformation that reduces the dimensionality of the image patch.