Hacker News new | ask | show | jobs
by thibaut_barrere 3273 days ago
Earlier this month I wrote an ETL extractor using Capybara & headless browser (to work-around a lack of API - PS: only do that as a last resort!).

One thing I learned and wanted to warn about here is that Chrome headless currently doesn't support file downloads [1].

PhantomJS won't work either (unless you use a custom build) [2].

I also tried with capybara-webkit, but no luck either [3].

The only driver which ultimately allowed me to download files at this point was Selenium + Firefox (with some tweaking on profile options browser.download.dir, browser.download.folderList = 2, browser.helperApps.neverAsk.saveToDisk set to the MIME type of files to be downloaded).

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=696481

[2] https://github.com/ariya/phantomjs/issues/10052

[3] https://github.com/thoughtbot/capybara-webkit/issues/691

3 comments

Unfortunately, headless Chrome is still missing some features and file downloads is one of them.

I believe Nightmare [1] (running on Electron) handles files download, it might be worth looking at it?

[1] https://github.com/segmentio/nightmare

Seconded, I did not try all of the other ways you had to go through, fortunately I was already using Selenium with chromedriver in my testing and I found this great set of examples for testing file downloads:

[1] https://collectiveidea.com/blog/archives/2012/01/27/testing-...

[2] https://forum.shakacode.com/t/how-to-test-file-downloads-wit...

It looks like the second linked post here derives from the first, but both are very similar. My tests don't actually download the files anymore (they are PDFs, and we opted to open them in a new tab instead, which is unfortunately harder to confirm with a test)

Add this to the list of useful wrappers for things I thought would be difficult to test as a regression spec, but ultimately weren't hard at all, like email delivery with email-spec/email-spec [3]

[3] https://github.com/email-spec/email-spec

>One thing I learned and wanted to warn about here is that Chrome headless currently doesn't support file downloads [1].

Not true. At least with C# You can use

            var client = new WebClient();
            client.Headers[HttpRequestHeader.Cookie] = cookieString(driver);
            client.DownloadFile(reportURL, savePath + fileName + ".xlsx");
Along with:

            string cookieString(IWebDriver driver)
            {
                var cookies = driver.Manage().Cookies.AllCookies;
                return string.Join("; ", cookies.Select(c => string.Format("{0}={1}", c.Name, c.Value)));

            }
Edit: Just wanted to emphasize that this ensures all the steps you did to login won't be lost since you're using the logged in cookie. The idea is to use Selenium to get the download url and then use a regular download method using the selenium cookie.
Yes - I could have done that on other cases, but not in the specific case I tracked, where I don't know the report URL at all, nor I can construct it (it's generated by some enterprise app / javascript code). I think the bug report I mentioned relates to that.

(otherwise you could use pretty much whatever you want to download the file, indeed).