Hacker News new | ask | show | jobs
by plonk 1303 days ago
Hi! Are these positional embeddings literally made by concatenating the patch embedding with a number, then passing that through the next layer, as suggested by the figure under "Images to Patch Embeddings"?

It's the most confusing part of transformers for me. How do we train the module that creates these embeddings?