| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gcr 3139 days ago

With a domain-specific tool, it's even easier though.

    curl ... | jq -r .last

1 comments

sillysaurus3 3139 days ago

Sure. But you can't use jq to scrape arbitrary websites, for example. :)

link

dpflan 3138 days ago

Jeff Atwood has an entertaining post about parsing HTML with regular expressions:

https://blog.codinghorror.com/parsing-html-the-cthulhu-way/

“”” That's right, if you attempt to parse HTML with regular expressions, you're succumbing to the temptations of the dark god Cthulhu's … er … code. “””

link

alexozer 3138 days ago

Let's not forget about this masterpiece: https://stackoverflow.com/a/1732454/864310

link

dpflan 3138 days ago

Indeed, its quality cannot be ignored and must be shared; it’s referenced in the Atwood post.

link

ams6110 3138 days ago

Parsing and scraping are different things though. You don't need to parse a web page to extract specific things from it.

link

jldugger 3139 days ago

Of course, if it's anything like HTML, the formatting will vary over time that you really want a more permissive parser like BeautifulSoup. I haven't found a cli interface, so I briefly wrote my own ages ago: https://github.com/jldugger/dotfiles/blob/master/bin/select.....

link

nurettin 3139 days ago

For cases where a website is not a tutorial for websites, regex is a suitable tool for scraping.

link