Hacker News new | ask | show | jobs
by felixr 1245 days ago
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding https://arxiv.org/abs/2210.03347

https://github.com/google-research/pix2struct

1 comments

Thanks will take a look at these