| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jrochkind1 3013 days ago
	Google Scholar definitely and intentionally offers no API. I don't see this lasting long...

5 comments

gpm 3013 days ago

At a glance it looks like it's really just a proxy, that was limited to scholar.google.com and mutates the page slightly (adds a header, sci-hub links).

Does google generally block proxy servers?

link

Vinnl 3013 days ago

I'd imagine that would be quite hard: many university libraries have their own proxies which make sure that visitors to the library are able to access the content that the library has paid for, and often modifies Google Scholar (with their cooperation, I believe) to list links to accessible versions of the content next to search results.

link

jrochkind1 3013 days ago

I don't know if they generally do, but I'm sure they can/will if they want to.

link

Spivak 3013 days ago

They don't block startpage and they've been around for a while.

link

jrochkind1 3007 days ago

Home page currently says:

> See you later

> Too much attention is a bad thing, Sci-Bay decides to stop service for a while. Sorry.

Apparently I was not wrong.

This could be developed as a browser plugin that would be much harder or almost impossible for Google to prevent. Well, a Firefox browser plugin, a Chrome browser plugin presumably they wouldn't allow.

link

matheusmoreira 3013 days ago

The page's HTML is the API. It's pretty easy to download a web page, parse the HTML and then extract specific bits of information from it. The browser does the same thing on the user's behalf, which is why it is called the user agent.

link

hrasyid 3013 days ago

An API is a contract. HTML can be tweaked and become incompatible with your parser at the developer's whim.

link

gkya 3013 days ago

Oh luckily major APIs never change. /s

link

hrasyid 3010 days ago

Not as easily as an HTML page

link

matheusmoreira 3013 days ago

That just means your code must be maintained. You can verify that the HTML has a given structure and log a failure if it doesn't.

link

amelius 3013 days ago

Use Deep Learning to circumvent that.

link

danielecook 3013 days ago

Why is that? Seems like it would be really beneficial to the scientific community.

link

shakna 3013 days ago

They likely don't have legal permission to allow for third-party access to the data they provide.

link

mchannon 3013 days ago

google.com/scholar doesn’t work for you?

link

wyldfire 3013 days ago

The issue is regarding: this service (Sci-Bay) depends on Google Scholar, yet there's no public API for Google Scholar that it could leverage. If it's scraping Google Scholar results, then it's likely a ToS violation and unlikely to last long.

link

userbinator 3013 days ago

How much of a "ToS violation" is SciHub, and how long has it lasted?

"If there's a will, there's a way" comes to mind. Also, the fact that all web pages technically already have an "API" --- it's called "HTTP" ;-)

Good for them, I say.

link