Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 1068 days ago
"Generate RSS feed for any website using CSS selectors"

For me, "CSS selectors" always seems like a deceptive term, if it means selecting HTML tag elements. What if the website does not use styling.

I read 1000s of websites, including all HN submissions, without using CSS. When I want to extract information from a website, I focus on patterns in the page. They might be HTML, they might be style elements, but they could be anything. I never assume that all websites will wrap the information I want in certain elements. There is a ridiculous amount of random variation amongst websites.

3 comments

I'm not sure that CSS being used on the page is a requirement. In the way that `h1 a` would be a valid CSS selector, in this case, would not be require that it be styled by a style sheet.

The key here is that it uses selectors, not the style sheets themselves.

You just need to use the same logic, syntax as CSS' selectors to pick out can ntent from the page. That's something a little different to CSS to style.
Using CSS selectors, exclusively, is brittle and prone to failure.