Hacker News new | ask | show | jobs
by crowdyriver 813 days ago
There's lots of comments here about how stupid is to parse html using llms.

Have you ever had to scrape multiple sites with variadic html?

1 comments

The example here has HTML with a somewhat fixed format. It would indeed have been better to have samples with different format and aiming for a low error rate.

If you are scraping a limited amount of sites, you could for each site ask the LLM for parsing code from some samples, review that, and move on.