Hacker News new | ask | show | jobs
by CHY872 415 days ago
The article basically argues: You would expect to get similarly good results with subsampling in practice. E.g. no need to process at 1920x1080 when you can do 960x540. Separately, you can break down many problems into smaller tiles and get similar quality results without the compute overheads of a high res ViT.