Hacker News new | ask | show | jobs
by dplavery92 1091 days ago
Transformers are not forced to use a specific input (or output) shape; the original ViT paper demonstrates interpolating positional embeddings to inference with arbitrary image shapes.