Hacker News new | ask | show | jobs
by ruslan_talpa 3274 days ago
You've hit a (pain) point. While graphql reduces the amount of trips to the backend for the browser, those round trips get pushed down to a lower level, between backend and db. The reference graphql-js implementation, while at first looks so easy, you just write a resolver for a field, makes it oh so easy to have terrible n+1 problems. dataloader helps a bit but it's not as optimal as it can be.

You need to be very careful. To be optimal, you'll basically need to write very complex resolvers that inspect the AST themselves and fetch the data optimally which at that point is almost as writing your custom execution module.

That is basically what i did, custom execution module to translate a graphql request to a single sql query (https://subzero.cloud/)

5 comments

Sometimes I worry whether GraphQL is analogous to ORMs: a theoretically elegant attempt to solve an impedance mismatch between a data source and its consumer, but one that ultimately just shifts or redistributes the impedance to different layers of the codebase...
I agree it seems like potentially the same problem, and it manifests in forcing devs to write 'resolvers' which looks like horrible drudge work. So I wonder if there are decent offerings without the impedance mismatch between GraphQL queries and the database (so we don't have to translate them into SQL). I did a very quick look and at least found this: https://dgraph.io/

I'm getting the impression that if nothing else though, I'm probably gonna need to wait a few years for this tech to stabilize more. Which I find very unfortunate, because the standard way of doings things these days feels very unpleasant to me (I hate writing boilerplate more than anything: have an overuse injury, so typing is the worst part of coding)—and the GraphQL queries do seem to make a lot of sense (although forcing clients to have explicit knowledge of schema structure seems a little dangerous...).

Here's a project that might be interesting.

https://github.com/postgraphql/postgraphql

Technique inspired by PostgREST and graphql spin "stolen" (in a good way) from subZero :), although a completely different implementation method then subZero
That's really cool - we're doing the same thing, translating entire queries into a single SQL statement!
I hope you are using PostgreSQL! :) Check out PostgREST and the types of queries it generates
Yes, Postgres all the way. We don't use PostgREST though we construct the same type of queries.
> custom execution module to translate a graphql request to a single sql query

If I have a schema like this:

    users
      projects   (many per user)
      blog_posts (many per user)
What SQL do you generate to handle a request for everything in a single query?
take a look at PostgREST https://github.com/begriffs/postgrest

simplified example query https://pastebin.com/7J2ZswRC

Ah, nice trick - thanks for the tip.
Which n+1 problems doesn't dataloader solve?
It eliminates n+1 but not in a perfect way.

If you have a 3 level query (3 tables), the best you can hope for is to get 3 sequential queries, like get first level, collect the ids, request the second level, collect the ids, get 3rd level. It gets even more complicated teh more levels you and the bigger the dataset returned.

all of this can be done using a single join which is one roundtrip and it's faster.

If you have low latency queries (eg. because most things hit cache rather than db) sequential queries aren't as much of a problem. Being able to join everything into one query, on the other hand, is a luxury which can be hard to maintain at scale.

In the 'join in SQL' case you could identify particular cases which are doing sequential queries and implement a different loader which just does one. It's not automatic but perhaps in most cases it's not necessary to do this step anyway. In the worst case you're back to doing as much work as you would for a bespoke API endpoint, but that's not the typical case. How much of a problem this ends up being in practice very much depends on the type of app you're building and how you intend to scale it.

Saying join is a luxury is a dangerous thing :) (for impressional devs :)) 99% of projects are not "at FB scale" :) so join is exactly the right thing to use, it's been tuned over decades so until your scale/dataset does not outgrow one box (and there are big boxes now), you are not going to do a better job then the query optimiser (cause after all that is what you are trying to do).
Fair enough. What I'm trying to get at is that you need to pay attention to the cost of the real world queries which are actually being executed against your API, and then optimize. I'm not sure that SQL is a magic bullet there either (you can still request too much data at once, for example).

In practice this probably means adding some logging of how long requests take and graphing it (say, 95th percentile request time) from time to time to spot pathological queries. Even better if you can automate it. I think this is stuff that everyone should be doing (after a certain stage), regardless of whether you are using GraphQL or a bespoke JSON API.

I'd say build out your GraphQL server under the assumption that joins aren't always an option, but allow them as an optimization where possible.
why would i "optimise" for the thing that might never happen (join not an option) while taking the hit right now? i've tested this (3 level query) and the throughput is 10 times slower with dataloader, as in i need 10 servers instead of one to do the same job.
Thank you, thank you. A critical piece of information.
You don't have to worry so much about n+1 (once you're using dataloader) as much as you do deeply nested queries, or queries that run against a large number of datatypes.
i am not sure i follow, the deeper the query or bigger the returned dataset, the worse the performance of a dataloader type solution (see my other comment).
We're agreeing here, I think. I'm saying that after taking care of the n+1 problem by using dataloader, you still have to worry about deeply nested queries.
This is true, but most complex UIs tends to result in queries that spread wide rather than deep. It takes some deliberate contrivance or fairly unusual real-world cases to get to more than 3-4 levels of nested relationships, and this is often the point at which you'd be thinking to deferred/lazy loading in the client anyway.

I've spent a significant amount of time over the last year or so optimizing GraphQL servers built on domain-driven services (i.e. joins aren't an option) and managed to get to equal (or very marginally worse) performance to existing handcrafted endpoints that returned equivalent data (it was possible to build the same UI, even though the payloads weren't identical).

There are areas where GraphQL is inherently inefficient (trying to work on ways to mitigate these issues), but the reality is that deeply-nested UI appears to be less of a problem than I originally thought it would be.