| HN Mirror

Not specifically social media sites, getting through prevention would be difficult and there are already a lot of existing companies working on scraping popular social media sites.

Interesting idea, we're definitely looking into coupling OCR and LLMs today but not for that particular case. I think raw language models with a good workflow are typically good enough to extract structured data from things like books

ML training is definitely one area we can see this being useful. General data aggregation across a large industry (clothing, retail, etc) is something we want to look into. Also RPA style workflows involving multi-click actions across a variety of sites