Hacker News new | ask | show | jobs
by jrochkind1 3013 days ago
Google Scholar definitely and intentionally offers no API.

I don't see this lasting long...

5 comments

At a glance it looks like it's really just a proxy, that was limited to scholar.google.com and mutates the page slightly (adds a header, sci-hub links).

Does google generally block proxy servers?

I'd imagine that would be quite hard: many university libraries have their own proxies which make sure that visitors to the library are able to access the content that the library has paid for, and often modifies Google Scholar (with their cooperation, I believe) to list links to accessible versions of the content next to search results.
I don't know if they generally do, but I'm sure they can/will if they want to.
They don't block startpage and they've been around for a while.
Home page currently says:

> See you later

> Too much attention is a bad thing, Sci-Bay decides to stop service for a while. Sorry.

Apparently I was not wrong.

This could be developed as a browser plugin that would be much harder or almost impossible for Google to prevent. Well, a Firefox browser plugin, a Chrome browser plugin presumably they wouldn't allow.

The page's HTML is the API. It's pretty easy to download a web page, parse the HTML and then extract specific bits of information from it. The browser does the same thing on the user's behalf, which is why it is called the user agent.
An API is a contract. HTML can be tweaked and become incompatible with your parser at the developer's whim.
Oh luckily major APIs never change. /s
Not as easily as an HTML page
That just means your code must be maintained. You can verify that the HTML has a given structure and log a failure if it doesn't.
Use Deep Learning to circumvent that.
Why is that? Seems like it would be really beneficial to the scientific community.
They likely don't have legal permission to allow for third-party access to the data they provide.
google.com/scholar doesn’t work for you?
The issue is regarding: this service (Sci-Bay) depends on Google Scholar, yet there's no public API for Google Scholar that it could leverage. If it's scraping Google Scholar results, then it's likely a ToS violation and unlikely to last long.
How much of a "ToS violation" is SciHub, and how long has it lasted?

"If there's a will, there's a way" comes to mind. Also, the fact that all web pages technically already have an "API" --- it's called "HTTP" ;-)

Good for them, I say.