Hacker News new | ask | show | jobs
by gioele 3805 days ago
> I could build a website using <div> alone.

Then my scraper that reads headlines on your website with `//(h1|h2|h3)` will not work. And I will not be able to read and linearize your tabular data. And plenty of other things. (Things you probably do not care about.)

Oh, and you website will have no links (the `href` attribute of `<a>` is not simulable via CSS).

3 comments

The thing is, most developers aren't targeting your scraper. At most they're targeting Google's, which can handle all sorts of whacky and not very semantic markup - and glean a decent amount of semantic structure from it.
You can do some hacky stuff with onClick listeners for links, especially if its a single page. That said, I do not do this. But I don't think I use much beyond div, span, a, ul, li, table (and friends), form (and crew), h1-6, script, body, head, and html. I know there are more HTML tags, but I rarely use them. They are all pretty self explanatory, and I have hard time believing the author looked at 80 pages and everyone used the wrong tags.
> Oh, and you website will have no links (the `href` attribute of `<a>` is not simulable via CSS).

JavaScript to the rescue!

/s

FTFY