Hacker News new | ask | show | jobs
by stickfigure 1917 days ago
No mention of what I see as the biggest con of GraphQL: You must build a lot of rate limiting and security logic, or your APIs are easily abused.

A naive GraphQL implementation makes it trivial to fetch giant swaths of your database. That's fine with a 100% trusted client, but if you're using this for a public API or web clients, you can easily be DOSed. Even accidentally!

Shopify's API is a pretty good example of the lengths you have to go to in order to harden a GraphQL API. It's ugly:

https://shopify.dev/concepts/about-apis/rate-limits

You have to limit not just number of calls, but quantity of data fetched. And pagination is gross, with `edges` and `node`. This is is straight from their examples:

    {
      shop {
        id
        name
      }
      products(first: 3) {
        edges {
          node {
            handle
          }
        }
      }
    }
Once you fetch a few layers of edges and nodes, queries become practically unreadable.

The more rigid fetching behavior of REST & gRPC provides more predictable performance and security behavior.

5 comments

I'm not convinced that GraphQL is any more difficult to implement robust rate limiting or other performance guarantees than a REST API with comparable functionality. As soon as you start implementing field/resource customizability in a REST API you have roughly the same problems guaranteeing performance. JSON:API, for example, specifies how to request fields on related objects with a syntax like `/articles/1?include=author,comments.author`, which is comparable to the extensibility you get by default in GraphQL. Different libraries which help you implement JSON:API or GraphQL may differ in how you opt in or opt out of this sort of extensibility, and perhaps in practice GraphQL libraries tend to require opting out (and GraphQL consumers might tend to expect a lot of this extensibility), but at the end of the day there's little difference in principle for two APIs with comparable functionality. And, as others have noted, the popular GraphQL implementations I've seen all make it fairly straightforward to limit things like the query depth or total number of entities requested.

Of course, if the argument is simply that it tends to be more challenging to manage performance of GraphQL APIs simply because GraphQL APIs tend to offer a lot more functionality than REST APIs, then of course I agree, but that's not a particularly useful observation. Indeed having no API at all would further reduce the challenge!

[0] https://jsonapi.org/format/#fetching-includes

> Of course, if the argument is simply that it tends to be more challenging to manage performance of GraphQL APIs simply because GraphQL APIs tend to offer a lot more functionality than REST APIs, then of course I agree, but that's not a particularly useful observation. Indeed having no API at all would further reduce the challenge!

On their own, such arguments are indeed not useful. But if you can further point out that GraphQL has more functionality than is required, then you can basically make a YAGNI-style argument against GraphQL.

Often the rates I'll end up limiting in rest aren't even bottlenecks at all in graphql. like if I wanted to grab a relationship that hasn't been implemented with its own resource endpoint.

e.g. get all the comments in every article written by one author, I might say `/author/john smith` that returns all their articles, then run an `/articles/{}?include=comments` for each one. That'll run a separate query server-side for each one, which can get very heavy if I'm doing thousands of queries. On the gql this is trivial as `{ author(name: "john smith") { articles { comments `, but because it's one request the server-side fetch can be run _way_ more efficiently. We have dataloaders for the SQL written that'll collapse every big query like this into (often) a `IN (?, ?`... query, or sometimes subselects. Same concept works on any sql or nosql approach. So yeah it might be "a lot" data were it RESTful, but we're not going to bottleneck on a single indexed query and a ~10MB payload.

The real advantage I see for REST in that scenario is that it can _feel_ faster to the end-user, since you'll get some data back earlier. Running a small query on thousands of requests is slower, but you can display the first little one's result to the user faster than a big gql payload,.

This is what I see as a huge misconception of GraphQL, and unfortunately proliferates due to lots of simple "Just expose your whole DB as a GraphQL API!" type tools.

It's quite simple (easier in my opinion than in REST) to build a targeted set of GraphQL endpoints that fit end-user needs while being secure and performant. Also, as the other user posted, "edges" and "nodes" has nothing to do with the core GraphQL spec itself.

I don't disagree with you, but graphql just lends itself well to bad decisions and many times when I've poked at graphql endpoints they share these issues (missing auth after first later, exposing schema by accident, no depth/cost limit). I think a combination of new technology w/o standardized best practices and startups being resource constrained proliferates poor security with graphql.

Of course, the same could happen for standard REST as well, but I think the foot guns are more limited.

I think I would agree. I'm a huge GraphQL fanboy, but one of the things I've posted many many times that I hate about GraphQL is that it has "QL" in the name, so a lot people think it is somehow analogous to SQL or some other generic query language.

So you get these very generic GraphQL APIs that map closely to the DB, when the exact opposite should be the case, that the APIs map as close as possible to the front-end use cases, and data is presented so that the front ends should need to have little, if any, customized view display logic. It even says so at the beginning of the spec:

> Product‐centric: GraphQL is unapologetically driven by the requirements of views and the front‐end engineers that write them. GraphQL starts with their way of thinking and requirements and builds the language and runtime necessary to enable that.

> no depth/cost limit

Or you can do like us, there’s no depth at all, since our types do not have any possible subqueries.

Rate limiting and security are trivial these days, with an abundance of directive libs available, ready to use out of the box, and every major third party auth provider boasting ease of use with common GraphQL patterns. I'd argue what you see as the biggest con is actually a strength now.

> And pagination is gross, with `edges` and `node`

This just reads like an allergic reaction to "the new" and towards change. Edges and Nodes are elegant, less error prone and limits and skips, and most importantly - datasource independent.

I'd be interested to see a graphql library that makes security trivial. Could you add some links?

In my experience, securing nested assets based on owner/editor/reader/anon was rather difficult and required inspecting the schema stack. I was using the Apollo stack.

This was in the context of apps in projects in accounts (common pattern for SaaS where one email can have permissions in multiple orgs or projects)

Hasura makes that pretty easy as can be seen here: https://github.com/firatoezcan/hasura-cms

This is also easy to do with self-written servers, maybe take a look at the metadata folder to get a gist of what Hasura would be doing behind the scenes (running a query and then checking the claim for the condition for the given field that permission wants to be requested for)

(Just a repo I started one evening, it doesn't do much but the concept of projects with owners and collaborators should work)

That's an end user experience on a platform. A library is something I can import into my own code to implement auth, without having to adopt a given stack. I wrote one, it's not simple (https://www.npmjs.com/package/graphql-autharoo)

Looking at the SQL and metadata, does not look all that simple for such a simple case. The complex part is behind all that, written by Hasura.

Imaging what that would look like with Orgs, Groups, and User permissions all existing on a single object, or even resource type, and how a single email (user) could have permissions at all of these levels on any object. Then consider that GraphQL allows nested query objects, so am I listing the objects as a top-level query, or is the list from a 1 to many relation nested under another query, where the query parsing system now batches these subqueries and presents them to the resolver in a big log. You have to understand the context of the incoming queries in each resolver, and then make auth decisions about it.

Think about using Hasure vs writing the auth systems in Hasura. Or how complex things get when you want to implement auth for multi-tenant SaaS.

I'm a huge fan of GraphQL, and work full-time on a security scanner for GraphQL APIs, but denial of service is a huge (but easily mitigated) risk of GraphQL APIs, simply because of the lack of education and resources surrounding the topic.

One fairly interesting denial of service vector that I've found on nearly every API I've scanned has to do with error messages. Many APIs don't bound the number of error messages that are returned, so you can query for a huge number of fields that aren't in the schema, and then each of those will translate to an error message in the response.

If the server supports fragments, you can also sometimes construct a recursive payload that expands, like the billion laughs attack, into a massive response that can take down the server, or eat up their egress costs.

I kind of feel that the server itself should protect against attacks like that. Of course it isn’t inherent in the specification, but I don’t think it’s something that an implementer should have to think about either (beyond, ‘have I enabled DOS mitigation ‘ anyway)
edges and node come from Relay, not from the core GraphQL spec. They're just one way to do pagination.

I like edges and node, it gives you a place to encode information about the relationship between the two objects, if you want to. And if all your endpoints standardize on this Relay pagination, you get standard cursor/offset fetching, along with the option to add relationship metadata in the future if you want, without breaking your schema or clients.

edit: the page you linked to has similar rate limiting behavior for both REST and GraphQL lol

Technically the spec is part of GraphQL itself now, but an optional recommendation, not something you’re obliged to do.

That said, like you I am a fan.

It’s a pretty defensible pattern, more here for those interested: https://andrewingram.net/posts/demystifying-graphql-connecti...

The overall verbosity of a GraphQL queue tends to not be a huge issue either, because in practice individual components are only concerning themselves with small subsets of it (i.e fragments). I’m a firm believer that people will have a better time with GraphQL if they adopt Relay’s bottom-up fragment-oriented pattern, rather than a top-down query-oriented pattern - which you often see in codebases by people who’ve never heard of Relay.

Also by people that have heard of relay but already have an existing codebase. It’s not something that’s very simple to adopt out of hand.
Seconded. I feel that the pagination style that Relay offers is typically better than 99% of the custom pagination implementations out there. There's no reason why the cursor impl can just do limit/skip under the hood (if that's what you want to do), but it unlocks you to change that to cursor based _easily_.

    {
      products(first: 3) {
        pageInfo {
          hasNextPage
          endCursor
        }
        edges {
          cursor
          node {
            handle
          }
        }
      }
    }