Hacker News new | ask | show | jobs
by chintler 992 days ago
I'll recommend the Spotlight paper by Google[1]. There are very interesting datasets they created for this purpose. They mention they have a screen-action-screen dataset that is in-house and it doesn't look like they'll open it. Maybe owning Android has its advantages.

There's a recent paper by Huggingface called IDEFICS[2] that claims to be an open source implementation of Flamingo(an older paper about few-shot multi-modal task understanding) and I think this space will be heating up soon.

[1] https://research.google/pubs/pub52171/

[2] https://huggingface.co/blog/idefics

1 comments

Thanks!