When your backend is sql how can you even do this efficiently? Databases require indexes. If you can query anything then there are performance bombs all over the place. It's different if you're querying elastic I guess.
The answer is simple: You can't. GraphQL _in general_ doesn't allow arbitrary queries. It allows arbitrary output field selection. But the filters are very explicit. It's more "pre-aggregation of request waterfalls and masking of outputs" than "querying a database".
Doesn't stop people from exposing their SQL databases directly from GraphQL by generating a "free-for-all" schema. And when they do - yep, that's definitely a performance bomb and not a good use of GraphQL.
> The answer is simple: You can't. GraphQL _in general_ doesn't allow arbitrary queries.
It really does. Surely, it somewhat limits the data that you get from it by defining a schema. But the moment you allow any nesting/connections between data in that schema, hello n+1 problem.
And then every discussion of this problem on HN or elsewhere exposes the ugly truth: almost everyone uses GraphQL as a REST endpoint in production by limiting the actual queries you can run and curbing nesting.
The n+1 problem has solutions though. The most well-known solutions may not suit your architecture, but please can we stop pretending they don't exist?
GraphQL has been public since June 2015, and there's been at least one solution to the n+1 problem (Dataloader) since September 2015. If you were using pure REST endpoints (just resources, no nesting/traversal) this is the exact problem you'd be punting over to the client to solve -- all that GraphQL is doing here is moving it back onto the server. The actual amount of work is the same, you just get faster response times.
Most implementations of GraphQL I've seen in different languages provide some variation on the Dataloader pattern. I'll fully concede it can be a hassle to set it up correctly, but it works.
It does. But it also means that the problem exists. It's there, you run into it by default, and you have to take special care to make sure it doesn't happen. And data-loaders are just a first step. Some systems try to actually calculate query complexities and nesting depth.
> this is the exact problem you'd be punting over to the client to solve -- all that GraphQL is doing here is moving it back onto the server. The actual amount of work is the same
Exactly. The complexity doesn't go anywhere.
I... don't know how all this is an argument against what I said.
> It does. But it also means that the problem exists.
The exact same problem exists on the client side with REST. I get what you're saying, but it's a lot easier to fix N+1 issues at the GraphQL resolver level, because once you've fixed it, you don't have to touch it again. With REST, you end up either creating ad hoc endpoints or changes to solve each individual problem in isolation, or you end up building a lot more flexibility into your REST API to solve it in a general way, in which case you've badly reinvented GraphQL without benefiting from the existing ecosystem.
Right. Regardless of how you go about doing things, if a UI is going to compose more than one kind of resource (which in reality is pretty much every UI), there's going to be some kind of n+1 problem hidden in there somewhere -- unless you go down the route of building a bespoke endpoint per screen or operation, which isn't the worst thing in the world, but it's not the work I want to spend my time doing when building products.
What GraphQL essentially gives you is a mechanism for providing a specification and getting back an endpoint, either fully dynamically at runtime, or at build-time if using persisted queries.
> but it's a lot easier to fix N+1 issues at the GraphQL resolver level,
It definitely isn't. And, once again, dataloaders are just the first step.
> With REST, you end up either creating ad hoc endpoints or changes to solve each individual problem in isolation, or you end up building a lot more flexibility into your REST API to solve it in a general way
Even with a "general way" your REST endpoint knows what actual request or a type of request it's solving. So, instead of doing several requests to the database where each request is essentially "SELECT *", you'll be doing queries optimised for the specific request type.
And there are many-many other things. Like caching. For which Apollo has to unpack and look into every single request and response, for example (and libraries in other languages don't solve at all).
The complexity goes to where the round-trips are shorter, and where the benefits can be shared between all clients regards of language; how is this not a good thing?
I never fully solved this problem, so don’t trust me, but I can tell you what I learned...
I think for one thing you can’t really rely on joins for query efficiency, because as you say there are too many combinations so it’s impossible to optimize everything.
Instead you have to try to query each data type separately. So you get a query for users. You do an SQL call and gather up a bunch of requests for offices, and then you do a single request to your office backend.
I think the best case is something like n SQL queries per request, where n is the depth of the tree you are querying (users->office->address is depth 3).
That means you’re doing all your queries after the first one by ID (not by arbitrary columns). So you have to have some way to “pre-join” your tables. You can do this either by optimistically joining your data to everything around it (query the node plus all of its edges) or you need to store your edges in your data model (which I have to assume is what FB does).
In the end your resolvers need to be using some standardized way of grabbing objects by is (or edge), something like https://github.com/graphql/dataloader
Whether it’s possible to do this efficiently I don’t know. At my last job we messed it up, and then we started applying a strategy like I described above, but then I switched jobs.
Would love to hear from others who have dealt with the same challenges.
So SQL is not a database :). It is a data access DSL that is implemented by databases. SQL being untyped I dont think is true - the table schemas are types (albeit basic product/record types). Inferring the type of a result is quite reasonable if you start with the schemas. SQl suffers from a UX problem for sure.
The answer is simple: You can't. GraphQL _in general_ doesn't allow arbitrary queries. It allows arbitrary output field selection. But the filters are very explicit. It's more "pre-aggregation of request waterfalls and masking of outputs" than "querying a database".
Doesn't stop people from exposing their SQL databases directly from GraphQL by generating a "free-for-all" schema. And when they do - yep, that's definitely a performance bomb and not a good use of GraphQL.