Hacker News new | ask | show | jobs
Show HN: APIRank.dev – We crawled and ranked public APIs from the internet (apirank.dev)
176 points by glimow 1202 days ago
tl;dr we at Escape (YC W23), we scanned 5651+ public APIs on the internet with our in house feedback driven API exploration tech, and ranked them using security, performance, reliability, and design criteria. The results are public on https://apirank.dev. You can request that we index your own API to the list for free and see how it compares to others.

Why we did that?

During a YC meetup I spoke with a fellow founder that told me how hard it was to pick the right external APIs to use within your own projects. I realized that most of what we build relies on public APIs from external vendors, but there was no benchmark to help developers compare and evaluate public APIs before picking one. So we decided to do it ourselves. Say hi to apirank.dev.

Why is ranking public APIs hard? Automating Public API technical assessment is a tough problem. First, we needed to find all the public APIs and their specifications - mostly OpenAPI files.

We used several strategies to find those:

- Crawl API repositories like apis.guru

- Crawl Github for openapi.json and openapi.yaml files

- A cool google dork

Those strategies enabled us to gather around ~20.000 OpenAPI specs.

Then lies the hard part of the problem:

We want to dynamically evaluate those APIs' security, performance, and reliability.

But APIs take parameters that are tightly coupled to the underlying business logic.

A naive automated way would not work: putting random data in parameters would likely not pass the API's validation layer, thus giving us little insight into the real API behavior.

Manually creating tests for each API is also not sustainable: it would take years for our 10-people team. We needed to do it in an automated way.

Fortunately, our main R&D efforts at Escape aimed to generate legitimate traffic against any API efficently.

That's how we developed Feedback-Driven API exploration, a new technique that quickly asses the underlying business logic of an API by analyzing responses and dependencies between requests. (see https://escape.tech/blog/feedback-driven-api-exploration/)

We originally developed this technology for advanced API security testing. But from there, it was super easy to also test the performance and the reliability of APIs.

How we ranked APIs?

Now that we have a scalable way to gather exciting data from public APIs, we need to find a way to rank them. And this ranking should be meaningful to developers when choosing their APIs.

We decided to rank APIs using the following five criteria:

- Security - Performance - Reliability - Design - Popularity

Security score is computed as a combination of the number of OWASP top 10 vulnerabilities, and the number of sensitive information leaks detected by our scanner

The performance score is derived from the median response time of the API, aka the P50

The reliability score is derived from the number of inconsistent server responses, either 500 errors or responses that are not conform with the specification

The Design score reflects the quality of the OpenAPI specification file. Having comments, examples, a license, and contact information improves this score

The popularity score is computed from the number of references to the API found online

If you are curious about your API's performance, you can ask us to index your own api for free at https://apirank.dev/submit

26 comments

This sounds silly, but I'm honestly curious: do you offer this information as an API?

On a more serious note, if the purpose is to be able to compare candidate APIs that you're looking at using, here are some features that I think might be helpful:

* Tagging, and filtering based on those tags. Example: "Show me APIs that deal with SMS"

* Select two or more APIs for direct comparison. Example: "Compare Twilio, CleverTap, and Amazon SES"

* Sorting by column heading. Example: "Sort by median response time, descending"

Also, I can't quite get the search to work for me. The term "Paul" returns the top entry ("P J S Paul Ministries API") and one other ("PoGoSnap API") that doesn't include the search term. "Twilio" returns no results, even though there are three entries on the bottom of the first page that start with that string.

Author here. Love your ideas! I've thought about tagging - I'd even love to have automated tagging of APIs from routes names etc.. but I'm super excited about sorting and comparison features

This is kind of a side project for us, but I'll definitely think about it

+100 on sorting
A terribly designed API with a complete and orderly OpenAPI 3.0.0 spec will get 5/5.

A well designed API with no OpenAPI support or 3.1.x spec support will get a poor design score.

I think user generated content might be the way to go for Design. Let people score and review the design quality.

I would suggest renaming the Design category to Documentation. Design is subjective, so user generated content might be an option, but ultimately docs and examples as (non subjective) a measure are probably more useful to most developers.
Hey coauthor here.

Thanks for your feedback! APIRank is new, and we are here to improve it and make it useful to the community.

APIRank only considers APIs with an OpenAPI or Swagger specification. It supports any version of Swagger v2+ or OpenAPI v3+.

However, I agree in practice for people, the "design criterium" is not only the compliance to the spec ;)

Antoine

A well designed API with no OpenApi spec support will not show up on this list.
I agree with your point, yet we thought the data we gathered could be interesting, for instance we observe that ~60% of routes have human readable comments, but less than 10% have actual examples.
This seems to lack tons of APIs , maybe instead of 'apirank' it should be called 'openapi dev rank'.

Apple Itunes : https://developer.apple.com/library/archive/documentation/Au...

last fm : https://www.last.fm/api

acoustid : https://acoustid.org/webservice

spotify : https://developer.spotify.com/documentation/web-api/

discogs : https://www.discogs.com/developers

deezer : https://developers.deezer.com/login?redirect=/api

music brainz : https://musicbrainz.org/doc/MusicBrainz_API

gracenote : https://developer.tmsapi.com/Sample_Code

---

Additionally, why not list API rate limits?

Or some lookups don't need any security, like querying Apple Itunes does not require anything except for being nice and rate limiting.

Spotify is there (twice!): https://apirank.dev/Spotify-Web-API/ https://apirank.dev/Spotify/

The search appears to be broken, because it returns no results for spotify.

We use Algolia free tier for the search, perhaps we have reached the limit :')
Hey coauthor here!

Thanks for your feedback! APIRank is new, and we are eager to make it better.

Please note APIRank we are only compatible with Swagger & OpenAPI specifications. You can propose to index any API you want here: https://apirank.dev/submit

Concerning Github APIs, indeed, we missed it. We will add the following in our next update: https://raw.githubusercontent.com/github/rest-api-descriptio...

or the two most famous ones I know of:

* https://boto3.amazonaws.com/v1/documentation/api/latest/refe...

* e.g. https://cloud.google.com/iam/docs/reference/rest

I'm sure there's an Azure one, too, I just don't currently play in that sandbox

I know that DataDog also uses openapi but since this site's search it broken, hard to know if it's hiding on page 57 or what

We used Algolia free tier for the search but all the credits were consumed :') I take this as a learning for our next side project
I try to impress upon my colleagues that error handling is the hallmark of good engineering, because the happy path is only one possible outcome

Had the search said "error while searching: 403" (or whatever) versus "no results" there would have been far fewer complaints in this thread

Does that iTunes API still work for the App Store(s)? That documentation appears to have been archived.
The performance metric is meaningless. As you yourself acknowledge, this is way too dependent on underlying business logic to be able to compare objectively. A 100ms API response may as well be "slower" than a 500ms one depending on what the calls are actually achieving.

Say service A has an API:

- GET /users/<id> - 100ms

Service B has multiple APIs:

- GET /users/<id> - 100ms

- GET /users - 250ms

- GET / - 500ms

Which service is faster according to your algorithms?

I'm also a bit skeptical of your other metrics. "P J S Paul Ministries API" is 5/5 in popularity while Adyen is 3/5?

This is an experimental research we did mostly because we thought it was cool and new.

There are limitations indeed, though if you compare similar APIs with similar business intent, their structure should also have similarity, unlike your example.

For the popularity metrics, I will take a look - I admit we can do better :)

What's the methodology? It's an interesting metric as long as there is some rationale attached.
I didn't see any API offered anywhere (which I would definitely love), so I decided to poke around. https://apirank.dev/__data.json will return paged data sources (query param ?p={pageNum}), and what really caught my interest here was how the data came in.

A sample response takes this form: `{ type: 'data', nodes: [ { type: 'data', data: [Array], uses: [Object] }, ... ] }`

Things get a little strange (for me at least!) when I start poking around in the data array. At index 0 in this array is an object with some information on what I think is mostly just paging and the length of the data array (sans this first element). At index 1 is an array filled with numbers that correspond to array indices in the initial data array, and the objects at those specified indices are the actual data sources. While the keys in those objects are actual descriptors of relevant information, the VALUES for those keys are also array indices. Essentially the incoming data structure is a super flattened array with some objects/ additional arrays that just point to other indices in the data array.[1]

Is there a name for this type of pattern? Is it just an artifact of using some query library to handle data fetching? Apologies for the naivety, I still consider myself a fairly new dev (especially when I see something like this!).

[1] More complete example (not sure if I explained this well):

  {
    type: 'data',  
    data: [  
      { endpoints: 1, count: 3971, currentPage: 5 },
      [2,221,355,...], // at array index 2 we should expect an object
      { id: 3, commentsRatio: 4, owaspIssues: 6, ... }, // array index 2: object with values that are all indices in the original data array that hold the actual values!
      ...
    ]
  }
> During a YC meetup I spoke with a fellow founder that told me how hard it was to pick the right external APIs to use within your own projects. I realized that most of what we build relies on public APIs from external vendors

I have a habit of sleeping under rocks but why do people do this? It seems incredibly brittle and strictly worse than leftpad-esque dependency hell – you can’t freeze or fork an API.

I can understand that for payments, maps or even email sending it’s more convenient, but that’s at least a bounded issue.

Are the apps we are building today gonna work in 3 years without meticulous maintenance? Are we doomed to suffer flaky experiences from compound latencies and rate limiting? And what’s the point of your 5 nines if you rely on third parties anyway?

I've been having this fight recently at work.

We're implementing a new (cloud) platform that's pretty central to our primary business. Some sub-projects have already spun off, before we had even decided how to set up and maintain the platform, because I said to meet the timeline we need to focus on more broad, general integrations first.

The decision was, instead, to bring in additional developers whom several months later we're still helping learn the APIs and troubleshoot problems (amidst the rest of our integration tasks).

Meanwhile a few weeks ago, I finally got the go-ahead and a DBA's time to do the integration I wanted. We've got a live database of 90-some tables dumped from the platform, including everything these projects are fetching (but with a greater delay between updates), accessible to the entire org. Reporting teams and business analyst are already writing queries to drive the services they need, freeing up our developers to work on other tasks.

A good API can provide very rapid, very specialized development; which is great when that development _is_ your business and so you're paying developers to keep up with it, or it's not so important that is kept up on (personal projects, prototypes, and temporary services). And a well designed and managed API shouldn't change _that_ much _that_ frequently.

But ignoring that an external API is always something you don't control and needs to be maintained is a costly mistake I've seen my org fall into over and over again.

You've listed a ton of sites as having some security misconfiguration (OWASP A05:2021), but haven't given any further information on how you've made that determination.

Given generic external scanners propensity for giving false positives, I'm very skeptical.

This. As soon as you see "number of vulnerabilities it contains", you know it's bullshit. If it were that easy to spot legitimate bugs, the authors would mostly already have fixed them. Without human verification, probably somewhere between 950 and 995 out of 1000 detections are bogus. Also OWASP has become such a meaningless buzzword, as if it's the only web bugs that matter, or as if it's a well-defined set with clear boundaries, let alone testable things (direct object reference / missing authorisation, good luck defining a rule for that, in general but especially with public APIs). (My employer is getting more corporate and guilty of this as well nowadays: trying to please buzzword-scanning customers by bringing up OWASP Top Ten in every web report no matter how relevant.)

I clicked because I was indeed curious how they'd rank, but this being the first point tells me that no sensible ranking could be found

The only objective metric in the set is response time, but anyone would agree that this isn't the only thing you use to select what api to use

What is the utility of comparing Coindesk and Giphy's api's?

In order for this to have any use, it seems to me you'd need to at least group the API's by use case/product type.

Yeah you're absolutely right! We couldn't add tags/categories easily for the first version but we are working on it for the future. Thanks for your feedback :)
This is on the way!
I love that the top result is some obscure christian ministries API. Big data always yields the funniest results :)
I suspect there is something wrong with the popularity calculation, would like to know more details about how that was done.
And as mentioned in the post we filtered soooo many awkward results ^^
A christian ministry's API is ranked higher than Coinbase? You guys are nuts.
I may have not put it so succinctly, but yeah, this definitely feels like it's missing the forest for the trees.

I'm really curious if a lot of those < 100 OWASP scores are really anything that matters in the real world.

"Ranking" normally implies decisive weight: we rank related things in order to decide between them. I'm not sure how I'm supposed to decide between "P J S Paul Ministries" and the Coindesk API, which are currently #1 and #2 respectively.

Similarly: how can you meaningfully measure the performance, security, or reliability of these APIs without first determining whether you're communicating meaningfully with them? I assume that many (most?) of these APIs require some kind of credential; it's hard to glean anything of interest from a server's ability to reliably send an error message,

I think this sort of resource (if including actual use-cases and potential alternatives) could be useful. Right now it shows scores for security, performance, reliability, design, popularity, and those seem to be calculated with very loose and unreliable scoring and I'm guessing they are mostly there to fill out the content. There does not seem to be non-openapi api:s there (for example OSM, wikipedia or any IA) I don't think this is useful for any dev choosing an API in its current state.

Is the plan to actually build this out to be a usable tool or is it just a way for you to get people to submit API:s and get some marketing?

I agree that there is a lot of room for improvement. We are a small team and we had to start somewhere to have feedback from the community and improve accordingly :)

Thanks for the great ideas, I share your opinion on those

would be nice if it explains why OpenAPI 3.0.0-compliance (with comments, examples, etc.) matters.
Yes! I assume that this is just content marketing to get link juice to feed our Google overlords, but yeesh.

I run a SaaS that helps people get API data into Google Sheets.

For very popular services, we make customized connectors that hide all the complexity.

For medium popularity services, we do some automated transforms to hide complexity in the requests, and massage the response shape to an easier-to-consume format.

For low popularity services, we'll do a best-effort to point customers towards docs and common pitfalls.

All of that is to say: I've seen a lot of APIs!

So it's very bizarre to see Xero ranked #3 out of 5,651. Xero has this convoluted system where you need to query to get your tenant ID, and then pass that as a header. If you don't pass it, you get an opaque error message. That's the 3rd best API, eh?

Also, you sort of have to wonder how much human review went into this. GitHub has an OpenAPI spec, is a hugely popular service, and seems to be absent from this index.

(edit: Perhaps Github is present? I got no results when I searched "github", but I also get no results if I search "xero", despite Xero Assets API being the 3rd hit.)

What SaaS is it? I think I have a feel for exactly this
This is very much an apples to oranges comparison. What meaningful insight can be obtained by comparing some of Uber's API endpoints with Instagram's?
This seems similar to ranking websites based on "Google's Lighthouse" scores... which is interesting, but not the defining attributes of a good website.

I also think there's way more metadata that could be useful in ranking. Things like API documentation, communities, usage cost, complexity of API, statefulness, age of the API...

Anyway, looks like a cool project that will be useful to some people

There seem to be some duplicates: the BBC Nitro API is listed three times in the first 100 results for me. Perhaps it has different versions?
Interested project.

But I worked on the Xero assets api, and I’m sad to say there was only one user. Maybe things have changed since I left, but I highly doubt it deserves a 4/5 ranking for popularity. I suspect 1 might now be more like 4 integrations now. But either way, it an irrelevant piece of product ecosystem. Every other API by Xero is more heavily used.

It's a shame (and a bit funny) that the details of an API yields a 500 error.

Looking forward to some more browsing on your site.

Hey, thanks for your feedback. Due to an unexpected load, we had to scale up our infra a little. It's fixed now ;)
Seems like it’s 500ing again for me.
I'm confused as to what the current gold medalist, the PJS Paul Ministries API, is...
All hail the holy API :))
Whats the difference between this and simply Googling for <insert keyword> api?
Hey, co-author here! Google does not allow you to browse APIs according to Security, Reliability, not Performance, etc.

Google is not an OpenAPI spec indexer either ;)

But I don't want to browse APIs. I'm not looking for any generic APIs. I'm looking for a specific API, such as a location API, or an email sending API. If I search for those in Google, it gives me 1-3 APIs that I need and can test.

And most of them have status pages that let me know their reliability.

For this first version, we assumed you know the API you want to use and want to get some extra info.

The next step could be adding tags/categories so you can look for specific APIs ;)

I have an API on here through RapidAPI. I think it's dinging me for RapidAPI's failures to meet your criteria. I don't think I can control how RapidAPI presents me for most of this.
When determining the median API response time, is that done from several locations? Not clear if the timing would be due to network or application latency.
What's the "cool Google dork".
Is there a description of what the api does ? I don’t know the vast majority of these names.
I get a 500 Internal service error
Hey, thanks for your feedback. As mentioned, we had to scale up our infra a little due to an unexpected load. It's fixed now ;)
It's not fixed unfortunately, I get 500 when I try to search.
Do I see repetitions?