Hacker News new | ask | show | jobs
by acgan 2463 days ago
Glad to see this important piece here (disclosure: I am one of the editors of The Gradient).

https://ieeexplore.ieee.org/document/5089308, from RCIS 2009 (Beel and Gipp) noted that "Google Scholar seems to be more suitable for searching standard literature than for gems or articles by authors advancing a view different from the mainstream."

Unrelated, but interesting: scraping Google Scholar is remarkably annoying if you want to actually use the data. The easiest way (in my experience) seems to be regex hacking on the BibTeX files, but this seems truly broken.

2 comments

Blocking scraping is the norm for Google, for instance the Public Youtube API allows you to view a grand total of 3 or so videos per key per day before it starts blocking you.

Google has basically got as bad as twitter in terms of giving a big middle finger to third party devs, but they have been smart enough to maintain a completely useless public/free tier for most things.

That makes sense. I'd hope that Scholar would be different, though.

A piece on how a researcher spent a summer filling out CAPTCHAs / scraping: https://www.nature.com/articles/d41586-018-04190-5

Scholar should be different, considering that they are the only ones in the world who are given access to everything
Scholar locks you out of the bibtexes after you download ~20 or so in my experience, but you can get around this if you instead save the paper to your favorites and then access the bibtex from the paper link in your favorites page.