| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rhizome 3165 days ago
	Obviously it requires paying careful attention to performance, and to take steps to mitigate abuse (query whitelists, complexity caps etc) Are these also "not that hard?"

2 comments

arianon 3165 days ago

"Query whitelists" sounds like sending to the server something like `{"query_id": 4, "variables": ...}` instead of `{"query": ..., "variables": ...}` which are straightforward to implement using any kind of server side key-value store and a middleware that maps the `query_id` back to the corresponding `query`, a tool that can help you with this is Apollo's PersistGraphQL [1]

I have no idea how I would go about implementing complexity caps though, but I guess I would do something like what GitHub has done for their own GraphQL API [2], which they explain better than I can.

[1]: https://github.com/apollographql/persistgraphql [2]: https://developer.github.com/v4/guides/resource-limitations/

link

exogen 3165 days ago

Another simple option for limiting complexity (I've considered implementing this in my GraphBrainz project): in the `context` provided to the GraphQL query resolver, increment a counter whenever a resolver requires fetching from an external API/database/etc. (whatever "too much of" would constitute abuse or just take a long time). Fail if the counter reaches some threshold. This would be really easy.

Also, instead of multiplying node counts like GitHub does (which is pretty clever!), another simple option would be to look at the depth of the query (how many levels down is the deepest leaf), and fail if it's over some maximum. This is also very easy to do as you get the query AST in the `info` field of the resolver. (This one is less effective than the one above since depth doesn't totally match up with resource usage, fields can be aliased, etc. but you get the idea.)

link

KirinDave 3165 days ago

> Another simple option for limiting complexity

Okay but... I guess my question is: why are you denying a client the right to make a complex query? Is it because all your queries are kinda slow and so you must hand-optimize them, leading to a combinatoric explosion of codepaths?

Or is it because your clients cannot judge how complex the queries they're making are? If so, isn't this actually a gap in the GQL spec? Lots of other query language implementations offer query description and estimation commands in their code.

Your proposed solution seems to me like it's brutal for your consumers. There's minimal indication of how quickly your complexity metric will rise in the query. You'd need to add ad-hoc per query&mutation arguments to push that query complexity cap up for legitimate uses.

link

exogen 3165 days ago

> Okay but... I guess my question is: why are you denying a client the right to make a complex query? Is it because all your queries are kinda slow and so you must hand-optimize them, leading to a combinatoric explosion of codepaths?

No, it's because:

(1) This is a feature of literally every API, most of them just use the extremely blunt instrument of rate limiting (even if requesting the same simple scalar value field over and over again does not add any strain on the server, you'll be rate limited just the same). Why aren't you asking this same question about REST queries?

and

(2) The 'Graph' part of 'GraphQL' means that queries can theoretically request connected nested objects of nearly infinite depth. This doesn't require that anything about the query code be slow or needs to be hand-optimized, or that there be any complex codepaths, just that MORE JOINS == MORE WORK and MORE PAYLOAD, no matter how perfectly optimized it is. Why aren't you asking "why does REST deny clients the right to make as deeply nested queries as they need?"

link

KirinDave 3164 days ago

> This is a feature of literally every API... Why aren't you asking this same question about REST queries?

Combinatoric explosions of complexity via a single query path are not a feature of every API.

> The 'Graph' part of 'GraphQL' means that queries can theoretically request connected nested objects of nearly infinite depth.

Thanks for this.

> just that MORE JOINS == MORE WORK and MORE PAYLOAD

So like SQL but without all the excellent query complexity tools or clarity around what precipitates a join?

> Why aren't you asking "why does REST deny clients the right to make as deeply nested queries as they need?"

Because RESTful APIs tend not to allow ad hoc graph traversal. When they do, it's because they're tunneling a graph query language. When they do (e.g., ElasticSearch) I (and we, as in the community at large)_DO ask these questions.

link

exogen 3164 days ago

> Combinatoric explosions of complexity via a single query path are not a feature of every API.

> Because RESTful APIs tend not to allow ad hoc graph traversal.

I think you're taking this graph part too literally. Almost every API has a "graph" of connected objects. GraphQL just makes it so that you can traverse them with a single query. REST endpoints tend to force you to make multiple queries to go back and fetch information about the entities whose IDs or URLs you received in earlier requests – thus the rate limiting. In both cases, combinoratic explosions (and infinite depth) are possible – REST just forces you to explode into more round-trips (and the server is likely doing even more duplicated work than it needs to to fulfill those subsequent requests).

If you wanted to simulate the ease-off aspect of REST requiring clients to return for multiple rate-limited round trips to get the query data they want, you could simply add a timeout in the nested object's GraphQL resolvers that perform self-rate-limiting. Same result but the clients don't need to know about it, they can just wait the same amount of time they'd have had to wait for all the data anyway.

link

andrewingram 3165 days ago

Query whitelist: Not trivial to do (properly) from scratch, but seems to be well-supported in library form.

Complexity caps: Depends on the server implementation. Sangria (Scala implementation) has it built in. For others, i'm not sure. It would be easier to add a depth cap, but a complexity cap is more useful. I think whitelisting and.or rate-limiting is the way to go if you're actually concerned about your GraphQL server being abused though.

link

KirinDave 3165 days ago

Query whitelist: Takes the query language out of the game and makes the system indistinguishable from old pre-restful .cgi endpoints.

Complexity caps: In the absence of good query scheduling this is a defense mechanism by GQL server providers. Great. But why are we doing it in the first place. This is still more complex with nearly identical outcomes to restful endpoints.

link

andrewingram 3164 days ago

A query white list is functionality identical to hand-written endpoints to serve each use case, except there's no hand-writing involved.

As long as you have a half-decent build pipeline and embrace static patterns on the client, you get to keep all of the benefits.

To be honest, you seem like someone who hasn't done a lot of client-side work, especially with component systems. The benefits to development speed and maintainability are significant, and nearly all the downsides can be mitigated.

If you're only approaching this from the perspective of a backend developer, i'd fully expect GraphQL to be underwhelming to you.

link

KirinDave 3164 days ago

> A query white list is functionality identical to hand-written endpoints to serve each use case, except there's no hand-writing involved.

This isn't really graphQL though, is it? This is an ORM. The ones I work with (in Blu(cough)Python) the binding layer I use works off standard ORM stuff, and could route object retrieval out of a restful API with minimal developer work as well (AuthZ, AuthN).

> As long as you have a half-decent build pipeline and embrace static patterns on the client, you get to keep all of the benefits.

I guess I simply have no idea what the benefits are. I keep asking, people keep giving me "technical" benefits that I'm challenging and I've yet to really see any refutation to those challenges.

It keeps coming back to:

> The benefits to development speed and maintainability are significant

And okay, a query DSL is useful. I totally buy into this. In fact, I have made many query DSLs. I advocate for linguistic metaprogramming every chance I get.

But what I'm saying is that the way it's implemented (server side) is, in many ways, regressive. I think you'd get most of the same benefits from a service worker doing that arbitration and dispatch. Your query whitelist updates wouldn't require infrastructure pushes AND client pushes. You'd have better caching story (because SW would make it easier to cache both the intermediate queries and prevent refetch on the top level call).

My primary objection to GraphQL is just how immature the tooling is. And I know how to dive in and provide SDKs for solving some of these hard problems, but then we end up with the 2017 equivalent of a .cgi script and I'm just... I don't get why any architect would agree to that.

> i'd fully expect GraphQL to be underwhelming to you.

Then why push it down to me as a requirement? And why recommend outrageous "solutions" like "oh well we'll just query whitelist so you lose all your flexibility." "We implement internal complexity caps you have no way to perceive from the query language but don't worry we'll 400 your request so you know."

These seem, to me, to make your frontend experience objectively worse than restful calls.

link