Hacker News new | ask | show | jobs
by AnthonyMouse 876 days ago
Why is the government cargo-culting the scourge that is API keys?

The goal of this should be for everyone to have access and lower barriers to entry, not put bureaucracy in the way of access and de facto suppress use by open source projects because each user would need their own API key unless someone publishes one.

7 comments

From the site:

  The NPS Data API is open and accessible to all developers
  who wish to use NPS data in their projects.
From their "API Guides":

  Limits are placed on the number of API requests you may
  make using your API key. Rate limits may vary by service,
  but the defaults are:

    Hourly Limit: 1,000 requests per hour

  For each API key, these limits are applied across all
  developer.nps.gov API requests. Exceeding these limits
  will lead to your API key being temporarily blocked from
  making further requests. The block will automatically be
  lifted by waiting an hour.
That, along with their ToS[0], hardly seems to qualify as a "cargo-culting scourge."

0 - https://www.nps.gov/aboutus/disclaimer.htm

API keys were invented as a tracking device. You sign up and then they associate all your use with one person and can do things like revoke your keys if you e.g. try to compete with the company's own products. Neither of these should be relevant to public data on a government service.

Rate limits are straight forward to implement per-IP address without having any other information about anyone. The sort of person willing to bypass them by using a thousand IP addresses is the same sort of person who would sign up for a thousand API keys using fake names. How are you supposed to rate limit by API key if "anyone" can get an API key? You'd need to use some means to rate limit how many API keys someone could request, which was the original problem.

> API keys were invented as a tracking device

And that's exactly how they're used as well. They need a method to track the usage of these services because there is often a cost involved with providing them. You also need a way to block or rate limit usage that is not IP bound.

As an example, when Yr[0] opened up their APIs for free world-wide weather forecast it quickly spiralled out of control. I don't recall the specifics of it, but in short a major phone manufacturer started using their APIs on their phones and it took down the service because of the increased load. They could have solved it by just adding more hardware, things like this is highly cacheable, but when you're dealing with tax payers money you generally don't want to subsidise for-profit companies. So you implement a token and tell them to implement their own caching layer on top of it, and everyone is happy.

I don't see how you'd solve something like that with anything other than a token. The methods you've mentioned in other posts simply don't work when a couple of hundred million phones ping your API every time they unlock their phone and it refreshes the weather widget. It also create no incentive for the developers to do things right, like not checking for updates every time the user does something, even though the initial request also came with a TTL and cache-control header that clearly states when this would be updated again.

[0] https://developer.yr.no

> They could have solved it by just adding more hardware, things like this is highly cacheable, but when you're dealing with tax payers money you generally don't want to subsidise for-profit companies. So you implement a token and tell them to implement their own caching layer on top of it, and everyone is happy.

The for-profit company is happy, anyway. They get free data and you've priced the competition out of the market.

What things like this are really useful for is to create the app equivalent of weather.gov. Most for-profit "repackage government data" websites and apps are ad-laden spyware that will spin your CPU at 100% and shovel every byte of data they can hoover up into a data warehouse that sells to anyone with a buck while doing little more than displaying the government data.

If you want to create an open source one which is free and promises not to track the user, you can, but then you need the data. If you end up with millions of users, who has more resources to set up caching servers, some individual idealist with zero revenue or the United States Government?

This shouldn't even be a question. The government has to operate infrastructure that can handle millions of users for many other reasons. This should be something they're experienced in, and something like this should just fit into a slot in existing infrastructure. This is what it's for. If all you want is to provide the data for various scummy middlemen to wrap in ads and spyware then why is it an API at all instead of a static data dump / live feed with the latest changes?

> The for-profit company is happy, anyway. They get free data and you've priced the competition out of the market.

And they'll also be happy to disregard all your wishes for them to implement their own caching layer and if you have no way to block this kind of activity they absolutely will do it. As demonstrated in the example I gave you.

> If you want to create an open source one which is free and promises not to track the user, you can, but then you need the data. If you end up with millions of users, who has more resources to set up caching servers, some individual idealist with zero revenue or the United States Government?

Me - as a taxpayer - isn't really keen on paying for everyone to build their application on top of it. If you create an open source application you can always tell the users how to obtain such a token.

> This shouldn't even be a question. The government has to operate infrastructure that can handle millions of users for many other reasons. This should be something they're experienced in, and something like this should just fit into a slot in existing infrastructure. This is what it's for. If all you want is to provide the data for various scummy middlemen to wrap in ads and spyware then why is it an API at all instead of a static data dump / live feed with the latest changes?

Again - why should I as a taxpayer have to pay for that? For me, the taxpayer, the service is just as available and usable, even if I have to request a token to use the service. How do you propose you'd limit how a service can be consumed without some kind of token? We've already established that your other solutions doens't work. The alternative is likely just to not provide the service at all, which seems like a net loss for everyone involved, both for for-profit business, taxpayers and open source developers.

> API keys were invented as a tracking device.

Yes, by definition.

Apparently, you did not review the "Disclaimer" link I provided. In it is the following:

  Not all information or content on this website has been
  created or is owned by the NPS. Some content is protected
  by third party rights, such as copyright, trademark,
  rights of publicity, privacy, and contractual restrictions.
  The NPS endeavors to provide information that it possesses
  about the copyright status of the content and to identify
  any other terms and conditions that may apply to use of the
  content (such as, trademark, rights of privacy or publicity,
  donor restrictions, etc.); however, the NPS can offer no
  guarantee or assurance that all pertinent information is
  provided or that the information is correct in each
  circumstance. It is your responsibility to determine what
  permission(s) you need in order to use the content and, if
  necessary, to obtain such permission.
Notice the first sentence; "Not all information or content on this website has been created or is owned by the NPS."

Perhaps there is a need for use to be "tracked" in order to ensure legal agreement to the Terms of Use?

That isn't a terms of use, it's a disclaimer. It's informing you that some of the information on the website might not be in the public domain, which is a simple factual statement that doesn't require you to agree to anything in order for it to be true or applicable.
My bad, I thought of it as a ToS. Thanks for the clarification.
Per IPv6 address? It’s very difficult (impossible?) to even make IPv4 based rate limiting work.
With IPv6 you use address blocks instead of individual addresses.

IP-based rate limiting is extremely effective because it bifurcates the internet into IP addresses controlled by the attacker and ones that aren't. The attacker can only issue requests at a rate of rate limit per IP address times number of IP addresses (or IPv6 blocks) they control. Then the IP addresses under their control get denied while the IP addresses not under their control, i.e. all of the other users, are unaffected.

This only becomes a problem if they control on the order of millions of IP addresses, but then you're dealing with a sophisticated criminal organization and are probably screwed anyway.

Yep. Very likely they are using API Gateway with Usage Plans, which is a very simple and effective way to do rate limiting and quotas.
If the provider is bearing the costs (like here) then they always need some kind of authorization, or they have no way to shut off abusers or people with misbehaving clients.

An API key is about the simplest possible way to achieve that, and appears to be perfectly adequate in this case.

What do you suggest? SAML?

> If the provider is bearing the costs (like here) then they always need some kind of authorization, or they have no way to shut off abusers or people with misbehaving clients.

HTTP is an "API" that has no API keys and all the public web servers in the world seem to manage this without any trouble.

> What do you suggest? SAML?

No authentication required by default -- it's public data. Just impose a reasonable rate limit by IP address and require registration only if someone has a legitimate reason to exceed that.

> all the public web servers in the world seem to manage this without any trouble

Incorrect. Most large web sites invest in DDOS protection e.g. Cloudflare.

Cloudflare DDOS protection as an example is a lot more sophisticated than merely counting requests per source IP (https://developers.cloudflare.com/ddos-protection/about/how-...).

Cloudflare is one of the ways they manage it.

But API keys aren't any good for that anyway because if someone is just trying to overload your service by brute force, they can send requests regardless of whether the keys are valid and still use up all your bandwidth sending error responses or your CPU/memory opening new connections prior to validating the API keys, and to avoid that you'd still need some kind of DDoS protection.

Where they actually do something is where you're doing accounting, because then if someone wants to send you a million requests, you don't block them, you just process them and send them a bill. Maybe you block them if they reach the point you don't expect them to be able to pay. But if it's a free service that anybody can sign up for as many times as they want then that doesn't do any good because the price is $0 and a rate limit per key is avoided by signing up for arbitrarily many more keys.

> HTTP is an "API" that has no API keys and all the public web servers in the world seem to manage this without any trouble.

Um, no. That’s just not true.

We're currently using a discussion forum that nobody signed up for an API key in order to make posts and you don't even need a user account in order to read. What allows them to sustain this without being destroyed by evil forces?
> nobody signed up for an API key in order to make posts

Yes you did. When you logged in, they gave you an API key in the form of a cookie that you include with every request.

And it's run at a loss by Y Combinator, which is very, very wealthy. And even hackernews has to pay for cloudflare and mods, on top of hardware, hosting, and traffic.

> When you logged in, they gave you an API key in the form of a cookie that you include with every request.

You can read this website (i.e. make queries against its database) without logging in. Moreover, the main thing the cookie does is not some kind of rate limiting or denial of service protection, it's assigning your username to your posts so that others can't impersonate your account. Various image boards exist that even allow you to post without logging in and they seem to be fine with it.

Probably either the lack of evil forces currently attempting to destroy it or cloudflare.
So we've established that it isn't API keys.
Per IP limits don't do anything about the scenario where the API is integrated into a third party website that sees a sudden spike in popularity. At that point, the API is providing free capacity to the third party site. Maybe that is fine, but you seem to be ignoring the possibility.
Because it's fine. That's what it's for, isn't it? The public, via some website, is requesting the government data their tax dollars have paid for.

Which allows that website (or app) to operate with minimal resources, e.g. by a non-profit or open source project, instead of having to be a for-profit entity which needs some underhanded way to generate revenue in order to display the "free" data.

API keys are important for effective rate limiting/abuse prevention.
I don't really understand your complaints about API keys, but if you did want to make an issue of something perhaps it should be that you get your API key sent to you by email, in plaintext. Not amazing, but I guess for their threat model it's generally ok.
API keys provide a straightforward mechanism for limiting use, and for allowing clients that get lots of traction to pay for higher limits. That’s not a cargo cult, that’s just design.
Could you show us an example of a service API that you maintain that doesn’t uses API keys?
RSS feeds.
What makes API keys a scourge?