| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by acgan 2463 days ago

Glad to see this important piece here (disclosure: I am one of the editors of The Gradient).

https://ieeexplore.ieee.org/document/5089308, from RCIS 2009 (Beel and Gipp) noted that "Google Scholar seems to be more suitable for searching standard literature than for gems or articles by authors advancing a view different from the mainstream."

Unrelated, but interesting: scraping Google Scholar is remarkably annoying if you want to actually use the data. The easiest way (in my experience) seems to be regex hacking on the BibTeX files, but this seems truly broken.

2 comments

throwaway2048 2463 days ago

Blocking scraping is the norm for Google, for instance the Public Youtube API allows you to view a grand total of 3 or so videos per key per day before it starts blocking you.

Google has basically got as bad as twitter in terms of giving a big middle finger to third party devs, but they have been smart enough to maintain a completely useless public/free tier for most things.

link

acgan 2463 days ago

That makes sense. I'd hope that Scholar would be different, though.

A piece on how a researcher spent a summer filling out CAPTCHAs / scraping: https://www.nature.com/articles/d41586-018-04190-5

link

buboard 2463 days ago

Scholar should be different, considering that they are the only ones in the world who are given access to everything

link

ajmooch 2463 days ago

Scholar locks you out of the bibtexes after you download ~20 or so in my experience, but you can get around this if you instead save the paper to your favorites and then access the bibtex from the paper link in your favorites page.

link