| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wingerlang 3271 days ago

Kind of off topic, but. Is running automation as complicated as this? I recently wanted to log in to a website, click some page and download a .csv file. I saw that chrome can be run headless, nice.

So I opened it headless and there's this REPL, nice I can do JS directly into the website. Now how do I automate this.

This leads me into all sorts of things that doesn't seem related at all (I is, but still). Selenium stuff, automation setups, drivers, language bindings, chrome-api-stuff and no end in sight.

All I want is something like "chrome -headless -js script-flow.file http://URL"

Am I to overwhelmed or is there no simple way without buttloads of 3rd party tools and setups required?

6 comments

diggan 3271 days ago

Yeah, it's actually way simpler than that since chrome (and I think Firefox now too) is exposing a API for driving the browser, what you're after is the remote debugging protocol: https://chromedevtools.github.io/devtools-protocol/

But, you have bunch of things that abstract that for you, so you don't have to implement things hitting that API manually. Anyways, here is a handy document from Mozilla on the protocol too: http://searchfox.org/mozilla-central/source/devtools/docs/ba...

link

bdcravens 3271 days ago

Chrome headless is a bit on the edge. Selenium + Firefox/Chrome is pretty mature, and Selenium even publishes Docker images that remove a lot of setup complexity. Pick your favorite language, grab the requisite WebDriver gem/module/etc and point it at the container.

Additionally, for many use cases, the many browser automation SaaS's out there are a good solution.

link

mrskitch 3271 days ago

I'm working on a high-level API to solve a lot of what you're describing. It's still in its infancy, but soon will be runtime agnostic: https://github.com/joelgriffith/navalia. File an issue for what's missing!

link

robk 3271 days ago

This one will be close one day for downloading at least. Ok today for scraping. https://github.com/LucianoGanga/simple-headless-chrome

link

Dirlewanger 3271 days ago

What does headless Chrome provide that a web scraper in any given language can't?

link

wingerlang 3271 days ago

I was hoping it would let me automate a few tasks with a "user" flow (i.e. enter details, click, click). With e.g. curl or python I didn't even get through the login screens because they seem to require some special dealing with cookies, request/response cookies, smfd, itc, id_ado, _ip_xat and so on.

Basically auth on websites seems to require a whole bunch of stuff and I just started looking at simpler forms of automation instead. Still not sure which one I'll continue looking into.

(If anyone sees this, I am trying to log into iTunes Connect and Fabric and download metrics)

link

dragonwriter 3271 days ago

> What does headless Chrome provide that a web scraper in any given language can't?

Everything a web scraper could do, without reinventing all the infrastructure for handling web content (including JS/DOM interaction) from scratch. You could obviously do it all yourself in your language of choice,but why not focus on the application specific parts?

link

VeejayRampay 3271 days ago

Taking screenshots comes to mind. It's still extremely complicated to this day to get it right. Even with the likes of PhantomJS, Chrome Headless, etc.

link

corford 3271 days ago

A nice quick solution for that is SlimerJS with Casperjs.

link