Hacker News new | ask | show | jobs
by arianon 3162 days ago
"Query whitelists" sounds like sending to the server something like `{"query_id": 4, "variables": ...}` instead of `{"query": ..., "variables": ...}` which are straightforward to implement using any kind of server side key-value store and a middleware that maps the `query_id` back to the corresponding `query`, a tool that can help you with this is Apollo's PersistGraphQL [1]

I have no idea how I would go about implementing complexity caps though, but I guess I would do something like what GitHub has done for their own GraphQL API [2], which they explain better than I can.

[1]: https://github.com/apollographql/persistgraphql [2]: https://developer.github.com/v4/guides/resource-limitations/

1 comments

Another simple option for limiting complexity (I've considered implementing this in my GraphBrainz project): in the `context` provided to the GraphQL query resolver, increment a counter whenever a resolver requires fetching from an external API/database/etc. (whatever "too much of" would constitute abuse or just take a long time). Fail if the counter reaches some threshold. This would be really easy.

Also, instead of multiplying node counts like GitHub does (which is pretty clever!), another simple option would be to look at the depth of the query (how many levels down is the deepest leaf), and fail if it's over some maximum. This is also very easy to do as you get the query AST in the `info` field of the resolver. (This one is less effective than the one above since depth doesn't totally match up with resource usage, fields can be aliased, etc. but you get the idea.)

> Another simple option for limiting complexity

Okay but... I guess my question is: why are you denying a client the right to make a complex query? Is it because all your queries are kinda slow and so you must hand-optimize them, leading to a combinatoric explosion of codepaths?

Or is it because your clients cannot judge how complex the queries they're making are? If so, isn't this actually a gap in the GQL spec? Lots of other query language implementations offer query description and estimation commands in their code.

Your proposed solution seems to me like it's brutal for your consumers. There's minimal indication of how quickly your complexity metric will rise in the query. You'd need to add ad-hoc per query&mutation arguments to push that query complexity cap up for legitimate uses.

> Okay but... I guess my question is: why are you denying a client the right to make a complex query? Is it because all your queries are kinda slow and so you must hand-optimize them, leading to a combinatoric explosion of codepaths?

No, it's because:

(1) This is a feature of literally every API, most of them just use the extremely blunt instrument of rate limiting (even if requesting the same simple scalar value field over and over again does not add any strain on the server, you'll be rate limited just the same). Why aren't you asking this same question about REST queries?

and

(2) The 'Graph' part of 'GraphQL' means that queries can theoretically request connected nested objects of nearly infinite depth. This doesn't require that anything about the query code be slow or needs to be hand-optimized, or that there be any complex codepaths, just that MORE JOINS == MORE WORK and MORE PAYLOAD, no matter how perfectly optimized it is. Why aren't you asking "why does REST deny clients the right to make as deeply nested queries as they need?"

> This is a feature of literally every API... Why aren't you asking this same question about REST queries?

Combinatoric explosions of complexity via a single query path are not a feature of every API.

> The 'Graph' part of 'GraphQL' means that queries can theoretically request connected nested objects of nearly infinite depth.

Thanks for this.

> just that MORE JOINS == MORE WORK and MORE PAYLOAD

So like SQL but without all the excellent query complexity tools or clarity around what precipitates a join?

> Why aren't you asking "why does REST deny clients the right to make as deeply nested queries as they need?"

Because RESTful APIs tend not to allow ad hoc graph traversal. When they do, it's because they're tunneling a graph query language. When they do (e.g., ElasticSearch) I (and we, as in the community at large)_DO ask these questions.

> Combinatoric explosions of complexity via a single query path are not a feature of every API.

> Because RESTful APIs tend not to allow ad hoc graph traversal.

I think you're taking this graph part too literally. Almost every API has a "graph" of connected objects. GraphQL just makes it so that you can traverse them with a single query. REST endpoints tend to force you to make multiple queries to go back and fetch information about the entities whose IDs or URLs you received in earlier requests – thus the rate limiting. In both cases, combinoratic explosions (and infinite depth) are possible – REST just forces you to explode into more round-trips (and the server is likely doing even more duplicated work than it needs to to fulfill those subsequent requests).

If you wanted to simulate the ease-off aspect of REST requiring clients to return for multiple rate-limited round trips to get the query data they want, you could simply add a timeout in the nested object's GraphQL resolvers that perform self-rate-limiting. Same result but the clients don't need to know about it, they can just wait the same amount of time they'd have had to wait for all the data anyway.

> GraphQL just makes it so that you can traverse them with a single query.

Yes. This is what I'm saying. GQL allows for a combinatoric explosion of potentially required queries (and in extreme cases, data providers) to fufill any request. And every GQL endpoint needs to be able to service all of them unless your request routing proxy can peek into body contents, which is more expensive than URL routing.

> REST endpoints tend to force you to make multiple queries to go back and fetch information about the entities whose IDs you received in earlier requests

A problem we can solve elegantly with HTTP/2 push using nearly identical underlying API servicing models. What's great about that approach is that it's totally transparent to the client; they just get better performance with less resources.

Instead, folks have decided to discard a lot of really positive aspects of the REST model to make a client-facing DSL realized in the server.

> In both cases, combinoratic explosions (and infinite depth) are possible

But in the classical rest case, the client is aware they're doing this, as well as the server. In the GraphQL case, we've obfuscated this and said, "We reserve the right to reject your quest for any reason, and we've also made it harder for us to service your query (unless we go back to mandating every valid query as in rest), and we've also made scaling harder because it's more difficult to factor endpoints into different scaling groups."

But hey, that DSL is great. It's like JSON without tall that predictability or syntactic validation.

I cannot see any positive outcomes to adopting graphql other than that, "Client-side developers love it". If ya'll love it so much, why not maintain it on your side via service-worker query interception?

I ask facetiously. The answer is: because that would be really hard, and we'd rather push it off to API endpoint devs. Devs who promptly put restrictions that basically render the best part of GraphQL (that it is a query language) impotent for performance reasons.

GraphQL just makes it so that you can traverse them with a single query.

How does this relate to the "I" in SOLID?