Hacker News new | ask | show | jobs
by wingerlang 3271 days ago
Kind of off topic, but. Is running automation as complicated as this? I recently wanted to log in to a website, click some page and download a .csv file. I saw that chrome can be run headless, nice.

So I opened it headless and there's this REPL, nice I can do JS directly into the website. Now how do I automate this.

This leads me into all sorts of things that doesn't seem related at all (I is, but still). Selenium stuff, automation setups, drivers, language bindings, chrome-api-stuff and no end in sight.

All I want is something like "chrome -headless -js script-flow.file http://URL"

Am I to overwhelmed or is there no simple way without buttloads of 3rd party tools and setups required?

6 comments

Yeah, it's actually way simpler than that since chrome (and I think Firefox now too) is exposing a API for driving the browser, what you're after is the remote debugging protocol: https://chromedevtools.github.io/devtools-protocol/

But, you have bunch of things that abstract that for you, so you don't have to implement things hitting that API manually. Anyways, here is a handy document from Mozilla on the protocol too: http://searchfox.org/mozilla-central/source/devtools/docs/ba...

Chrome headless is a bit on the edge. Selenium + Firefox/Chrome is pretty mature, and Selenium even publishes Docker images that remove a lot of setup complexity. Pick your favorite language, grab the requisite WebDriver gem/module/etc and point it at the container.

Additionally, for many use cases, the many browser automation SaaS's out there are a good solution.

I'm working on a high-level API to solve a lot of what you're describing. It's still in its infancy, but soon will be runtime agnostic: https://github.com/joelgriffith/navalia. File an issue for what's missing!
This one will be close one day for downloading at least. Ok today for scraping. https://github.com/LucianoGanga/simple-headless-chrome
What does headless Chrome provide that a web scraper in any given language can't?
I was hoping it would let me automate a few tasks with a "user" flow (i.e. enter details, click, click). With e.g. curl or python I didn't even get through the login screens because they seem to require some special dealing with cookies, request/response cookies, smfd, itc, id_ado, _ip_xat and so on.

Basically auth on websites seems to require a whole bunch of stuff and I just started looking at simpler forms of automation instead. Still not sure which one I'll continue looking into.

(If anyone sees this, I am trying to log into iTunes Connect and Fabric and download metrics)

> What does headless Chrome provide that a web scraper in any given language can't?

Everything a web scraper could do, without reinventing all the infrastructure for handling web content (including JS/DOM interaction) from scratch. You could obviously do it all yourself in your language of choice,but why not focus on the application specific parts?

Taking screenshots comes to mind. It's still extremely complicated to this day to get it right. Even with the likes of PhantomJS, Chrome Headless, etc.
A nice quick solution for that is SlimerJS with Casperjs.