Hacker News new | ask | show | jobs
by tbalsam 880 days ago
I enjoyed this paper (I share a discord with the author so I read it a bit earlier).

It's not entirely clear from the comparison numbers at the end, but I think the big argument here is efficiency for the amount of performance achieved. One can get lower FID numbers, but also with a ton of compute.

I can't really speak technically to it as I've not given it a super in depth look, but this seems like a nice set of motifs for going halfway between a standard attention network and a convnet in terms of compute cost (and maybe performance)?

The large-resolution scaling seems to be a strong suit as a result. :)

2 comments

Thanks a lot!

Yeah, the main motivation was trying to find a way to enable transformers to do high-resolution image synthesis: transformers are known to scale well to extreme, multi-billion parameter scales and typically offer superior coherency & composition in image generation, but current architectures are too expensive to train at scale for high-resolution inputs.

By using a hierarchical architecture and local attention at high-resolution scales (but retaining global attention at low-resolution scales), it becomes viable to apply transformers at these scales. Additionally, this architecture can now directly be trained on megapixel-scale inputs and generate high-quality results without having to progressively grow the resolution over the training or applying other "tricks" typically needed to make models at these resolutions work well.

Which discord if its open to the public? I was on one woth kath in 2021 and loved her insights, would love to again
You and the guy below you in this thread should probably tag me on twitter, same tag as here, I can point you. I do not especially want to leave the discord link in a frontpage hn thread.
Same; a good ML focused discord would be great. Training ViTs all day is lonely work. I'm mostly locked into skimming the "Research" channels of image generation discords. LAION used to be decent with a good amount of interesting discussion, but it seems to have devolved into toxicity in the last year.
See my other comment replying to that.
LAION is good