Hacker News new | ask | show | jobs
by sneak 4930 days ago
> No matter how you slice it parsing HTML from a 3rd party site is a major hack

WebKit seems to do just fine. I think you're making excuses.

1 comments

Until the source site changes the URL, or the URL arguments, or the page structure, or the doctype, or the CSS selectors, or the element ids, or whatever it is you key on to ferret out the content you care about. Scraping data embedded in markup not bound to an API spec is fragile regardless of how "properly" you parse it because there is no guarantee of structural consistency.