Hacker News new | ask | show | jobs
by KirinDave 3238 days ago
I still can't quite figure out the value of any of these schemes.

Yes, APIs seldom elegantly encode into the set of HTTP verbs and responses that we associate with a "RESTful" design, I grant. And so maybe we can come up with better.

But the notion of JrGQL and GQL as query languages means that the servers handling these calls must be query resolvers. Unlike most restful interfaces which tend to devote a single uniform interface per endpoint with only minor modifications, a full query model of your domain means an explosive quantity of potential strategies piped through a single endpoint.

I've used Python, Ruby, Node (in ts and js) and Haskell to service GQL queries and in all cases it's not trivial.

The popular NodeJS bindings tend to cause huge overfetching because each field tends to have a unique resolver but there is no rule about combining them. The pooular Python bindings (graphene) let you merge this, but the programming model to handle the arguments and sub-arguments is very frustrating as in different places, different soirces of logic will government what gets fetched (sub objects use SqlAlchemy, but outer objects with ANY sorry of query logic need to be custom). Ruby's bindings are the same.

Haskell's popular solution let's you cobble a responder from a proof of concept, and it leads to optimal query scheduling. Still not the best: it's by no means complete and requires quite a lot of work to set up.

These GQL systems push a huge burden onto every api endpoint with the proposed trade-off: "Well now the client has an easier time." Even if that's true, now the backend needs to be much, much smarter than before to give a marginally better interface for clients.

I'm still very skeptical of this whole concept.

12 comments

As I understand it, the problem GraphQL and related query languages are trying to solve is that in single page web apps and native apps, there's a trade-off between long round trip time on one hand and creating one-off RESTless endpoints that mirror the visual structure of your app on the other. GraphQL allows you to put work into defining resolvers for everything in the hope that it eliminates maybe 90% of your RESTless routes. Instead of complicating the server to make things a little easier for the client, it simplifies the server by removing knowledge of the client's ui. I'm not sure what you mean by every api endpoint, because graphql should pretty much only have one.

Regarding nodejs and overfetching, the canonical solution is Facebook's DataLoader library. I think it might even be in the docs somewhere.

Regarding jrGQL, I'm not sure the use case is the same. They look like different tools for different problems, but I'd be interested in hearing more about what problem it solves.

As a GraphQL user, hardly I can say it is perfect, but it does solve many current challenges arisen from the surge of mobile use, namely the waste of bandwidth on headers and unused content, as well as latency between requests.

From the server point of view, using GraphQL may not sound great since any single tiny update on the schema also implies an update on the single endpoint, rather than just one of many. Until there is an implementation of dynamic modularization, maintaining the integrality of a service is a constant pain.

Fairly enough, I think the tradeoff is balanced. GraphQL solves problems on the client side and yet it creates problems on the server side.

Meanwhile, I am more concerned about the future of GraphQL as http2 has arrived. With http2, the biggest selling point of reducing latency by using one single endpoint does not sound anymore. Then for the flexibility of limiting the scope of return, you can always specify it on the conventional RESTful API. So there is not much practical benefit. (Oh yes, there are some more goodies like subscription, but let's face it that not many use it) So to make GraphQL more useful, I think it needs a bigger evolution.

> Meanwhile, I am more concerned about the future of GraphQL as http2 has arrived. With http2, the biggest selling point of reducing latency by using one single endpoint does not sound anymore. Then for the flexibility of limiting the scope of return, you can always specify it on the conventional RESTful API. So there is not much practical benefit. (Oh yes, there are some more goodies like subscription, but let's face it that not many use it) So to make GraphQL more useful, I think it needs a bigger evolution.

GraphQL's most significant latency gain isn't achieved through the use of a single endpoint to minimize parallel requests. As you mentioned, HTTP2 makes this a non-factor in terms of latency. The single endpoint part is only an implementation detail, and most follow it simply by convention.

Rather, GraphQL's major breakthrough in latency reduction is achieved through the use of the graph query language to completely bypass the multiple client-server roundtrip problem for interdependent data (i.e. having to wait for a user request to return the organization id, before being able to send a request to find some property off the organization the user belongs to). HTTP2 isn't able to help at all for this use case.

> Instead of complicating the server to make things a little easier for the client, it simplifies the server by removing knowledge of the client's ui. I'm not sure what you mean by every api endpoint, because graphql should pretty much only have one.

Well I've yet to work with a set of server-side tools that actually deliver on this promise. I really don't like the tools I have.

I'm now attempting to synthesize and optimize SQL queries in python. Messy business. Way more defect-prone.

> DataLoader

I'm a haskell programmer and even I think that programming model is pretty brutal.

At Facebook, GraphQL is used in a such a way that all queries are predeclared -- developers write a collection of queries that represent their client's interface to the backend, and only these queries are used in the live app. This has two positive effects:

* It's fairly easy to tell "how the app is using the database" -- queries are not dynamically generated or hidden in code.

* The query performance is predictable and controllable in the same way as a stored procedure approach to database access -- but without actually requiring a DBA to load/unload procedure definitions.

One might argue that these benefits could be obtained without codifying a new query language...but when you start to think about nested objects you start to warm up to the idea.

> when you start to think about nested objects you start to warm up to the idea.

You do, but any time you try to solve a problem, and your solution ends up being a new language, I think it's worth asking yourself, "Am I overcomplicating this? Can I do this in a simpler way?"

And I think you can. I've been toying with an approach I like to call http-etc, which is a simple idea that lets you flag the http (rest) api to say, "Give me this data, but also give me the graph associated with this data."

For instance, if I want an article, I would do:

  /api/0.1/article/z098d0s8dga
That would give me information about the article with the slug or internal id of z098d0s8dga.

But if I want an article, plus the comments associated with it, and information about the users that posted those comments, I can do:

  /api/0.1/article/z098d0s8dga+
You have two routes that are very similar but work in completely different ways. One gets specific information, the other returns the graph from a specific starting point.

And this is handled completely with your server-side api design. You have your singular route to one function, and you have your "etc" route to another function. Synchronizing the shape of the data requested/returned between server and client is still handled manually, but it seems to me this pattern solves 90% of GraphQL use cases with 1% of the complexity of GraphQL. It isn't tied to any specific language or codebase on the server-side. It doesn't require a big client-side library be added to your project. It's just a simple pattern.

With this approach, how do you filter stuff out below the sub graph?
I cannot agree more on that having a simpler and clearer pattern is perhaps what we need. Solving 90% of GraphQL use cases with 1% of the complexity is a killing point to me.
This makes a LOT more sense. The compiler part is what keeps hanging me up.
Client UX is king for a large majority of user facing applications. And that's where client-centric query mechanisms like GraphQL/Falcor/Om.next make the most sense.

The point is exactly to push data fetching complexity from the Client to the API server. And that's generally a worthwhile tradeoff for these applications because you can deal with server complexity by throwing more money at your API server cluster, but you can't force your users to upgrade to more performant clients.

Of course, server-side overfetching is a problem that can drastically limit the scalability of your system by overloading the components you can't as easily scale by spinning up more machines (i.e. most databases). Naive implementations of GraphQL on the server can be ridiculously chatty and require multiple server-db roundtrips to resolve even the simplest of queries, which can arguably be an even nastier problem than dealing with multiple client-server roundtrips with a RESTful API.

This is why non-trivial GraphQL servers generally use some kind of resolver batching/caching layer (Facebook provides a library called DataLoader to facilitate this: https://github.com/facebook/dataloader) or a query planner at your root resolver (like Join Monster: https://github.com/stems/join-monster).

Neither of these approaches is trivial to implement, but then again, doing efficient data fetching on the client against a RESTful API is just as difficult, if not more so. Efficient data-fetching is just a very difficult problem, and involves essential complexity that needs to live somewhere. At the end of the day it's up to you if you want to deal with that complexity in your clients (using a traditional RESTful API) or on your API servers (using a query mechanism like GraphQL).

you can run the javascript client in the server, that is what http://hyperfiddle.net/ does

For example, server side rendering. Though Hyperfiddle is a bit more sophisticated than that.

I use Elixir and Absinthe, and was very skeptical as well until the past month about GQL.

I'm now completely sold, for a JS powered app, using Preact and Apollo-client makes working with the backend incredibly easy, and realtime via subscriptions works very well.

I've used it to build a couple of large apps, and am seeing very good results, and much easier to build something in the frontend. The backend handles these queries very well via pattern matching, and I also know the queries ahead of time. In future I would want to lock these down to certain queries.

I'm very impressed by GQL, after having to deal with crappy APIs in the past. JSONAPI was almost perfect to me (working with Ember), but the fact there was no bulk submit was very shity to me.

I suspect I would like it more if I wasn't stitching microservices that might also need client interfaces together.

Gql buys me absolutely nothing at all having microservices connect.

The most valuble thing IMO is being able to perform all data fetches in the one database transaction; this means that you get consistent data inside of your application (rather than different divs being out of sync if the page loads while the server is updating content)
This is likely untrue. Unless you fetch data only from one single database, so that you can perform an atomic transition, GraphQL cannot guarantee you to receive the data you want. Technically, a GraphQL server may combine data into a single return from different sources where locking is not possible.
> This is likely untrue. Unless you fetch data only from one single database, so that you can perform an atomic transition

I do! I find it odd that anyone would do anything else...

It is a lovely world when you can do this.

In microservices-driven-architrctures you often have to appeal to multiple services to service a request.

Why does "a single endpoint" matter? It seems like the endpoint essentially becomes an implementation detail. True, resolvers can be complicated. But would you bet against improvements to tooling/languages/frameworks making it palatable? IMO GQL's value is compelling enough for something analogous to Rails (something that abstracts away much of the mechanics through convention) to emerge
So let's take the most trivial case. Pretend you can query a Person object with both familyName and personalName.

A single GQL query resolver must be able to recognize which field you want and ideally construct the minimum query that meets the requirements. This is much harder than simply recognizing the correct parameters out of the URL, body and queryParams and execute it.

While the stakes are low for overfetching in this case (odds are the DB doesn't notice and maybe it's not a problem for your network link), as a general problem (in particular when fields have subfields) this gets frustrating very quickly.

My concern is not that this is bad for clients. It makes a ton of sense for clients. It's just much harder for servers to schedule optimal queries. Suddenly you're not only managing unreliable resources and minimizing latencies you don't own, you're also a compiler or interpreter for a language with more than one valid query.

For some types of backing stores it may not matter (ECKRV's such as Dynamo, for example). For others it may matter a lot (Postgres). And I think a lot of server owners ship backends that maybe do 2x more querying than they have to, given the "best practices" and examples I see online for the NodeJs bindings.

I've heard the Erlang bindings are very good. I haven't tried them yet, as Erlang isn't very high on my list of tools to use these days (not because it's bad). Maybe they have a silver bullet.

Very much true. My point though is that I'd expect that complexity to get "built in" at a lower level over time — analogous to how we can submit SQL to a DB server, which has a query planner built-in to figure out how to actually execute the query. App developers don't have to write a SQL query planner.
I've been reading the stuff folks have recommended here. I can explain comonads to a teenager but even I thin that these programming models are pretty intense.
I'm not 100% sold on GraphQL but the teams I work with are moving towards it. It does help a bit with micro-service architectures since you'd be writing some API interface to abstract potentially dozens of services anyway. After the pain of writing all of the necessary bindings/schemas/models or whatever they all are it is convenient at least on the calling side. The web GUI for testing out schema's comes in very handy.

The pain I've felt is with API versioning. Someone set a type for a parameter wrong, it was a union between int and string but the original implementation was only typed for int. Since it was sent out to production that becomes too risky to change since the call will fail if the param type is wrong, even though the underlying JS would have handled ints and strings just fine.

Overall it is a pretty neutral technology for me. The benefits and determents seem to balance just enough that I'm not going to pick that battle to fight against it. So while I agree it doesn't add as much value as the hype might suggest, it also doesn't detract enough to worry about.

The versioning problem can usually be resolved fairly elegantly by versioning on a field-by-field level, i.e. adding a new field called x-v2 and serve it alongside x without removing the original. You can add some kind of a deprecation warning to x to dissuade new clients from using it.

With this approach you can end up with a somewhat noisy schema, but for public APIs it's usually worthwhile to just keep the old versions forever to avoid breakage. For private APIs, with good tracking, you can find out when the original field stops getting requested, and clean up your schema then with relatively little risk.

I guess in terms of deprecation, it is not a unique characteristic to GraphQL, it happens to RESTful APIs as well.
> But the notion of JrGQL and GQL as query languages means that the servers handling these calls must be query resolvers. Unlike most restful interfaces which tend to devote a single uniform interface per endpoint with only minor modifications, a full query model of your domain means an explosive quantity of potential strategies piped through a single endpoint.

GQL does seem to encourage, if not assume, a type-theoretical graph datastore as the backend. (Who couldn't whip such a thing together, if it were lacking?)

But it does not require one. This point may be lost in the tutorial and examples shown on graphql.org. There is no reason one could not map a basic GQL service onto an existing set of REST endpoints, with each one a type, and the return of their fields controlled by the request.

> There is no reason one could not map a basic GQL service onto an existing set of REST endpoints, with each one a type, and the return of their fields controlled by the request.

I know you meant this to be rhetorical but unless this added some fundamental features (e.g., abstracting a distributed transaction negotiation) besides a new awkward interface I think I'd lose my mind writing GQL proxies over all my existing services.

Why just add a layer of waste and latency, along with a whole new intermediate serialization? Seems like make work. Just to adopt a technology for no real reason.

It depends on what your UI is. If your page displays information that can be loaded with a few concurrent calls to a single RESTful endpoint each, then by all means stick with that.

On the other hand, if your page loads information from ten different endpoints across three different services, and some calls depend on the results of previous calls, then you have four options: take a massive performance hit, go back to rendering everything on the server, create an endpoint for each page, or implement some way of querying embedded resources.

GraphQL acting as an intermediary for multiple REST apis is one way of doing the latter.

Or move to http2 for services?
Assuming you have control over the api servers. And you have clients that support http2.
Don't (E/A)LBs support http2 now? It's at over 80% client coverage globally, right? Seems pretty safe to IE6 the stragglers and stop worrying about their optimized experience, imo.

Also, http2 push is VERY interesting for application servers.

I think you answered your own question... GraphQL and similar are meant to push complexity to the endpoint partly in order to not tax resource-constrained clients. If that's part of your requirements then it's a great fit, otherwise your points are valid.
In a more perfect world, it'd make sense. Mutations in particular make lots of sense because it narrows the scope of any required transactional semantics.

But the fact that I've only seen one implementation correctly implement the query semantics worries me deeply. At first I assumed I was using bad libraries! But soon I realized that lots of devs were simply shipping significantly less efficient servers to production.

There are still many gaps the developers need to fill in order to use GraphQL in production. But when implemented correctly, one of the practical benefits is that multiple queries from client to server can be merged into one, thus saving bandwidth and improving performance.

  a full query model of your domain means an explosive quantity of potential strategies piped through a single endpoint.
I agree with this fact. But like you said, it's just trade-offs. I'm sure certain types of projects will benefit more from the pros than cons of GraphQL. But certainly not all projects will do so.
> one of the practical benefits is that multiple queries from client to server can be merged into one, thus saving bandwidth and improving performance.

Sure, but if we're doing that by dragging the API implementer over every possible permutation of queries, optimizing whatever you could possibly ask for, I'm not convinced this is LESS work than me shipping BTO API requests.

> I'm sure certain types of projects will benefit more from the pros than cons of GraphQL.

My sneaking suspicion is that it's the kind of projects where their data model is small and isolated and doesn't offer much sharing between queries. Which is great when you can get it.

>> The popular NodeJS bindings tend to cause huge overfetching because each field tends to have a unique resolver but there is no rule about combining them.

I don't have any experience with NodeJS+GQL but used Golang+GQL a lot. Typically a resolver of a field simply returns the property of a structure that is returned by the parent field, that's it. The only thing is the parent field should unboxed from the interface{} type.

I like to think of it like this: If you over fetch, you throw it away at the server and not after sending it to the client. You can wrap a standard REST API and it'll still be more optimal.

We've been using REST and slowly adding hacks to stop sending so much data. We have 2x API calls that used to be 1mb each and now they're something like 5kb with GQL.