Hacker News new | ask | show | jobs
by madewulf 4069 days ago
It really pleases me to finally see a big credible player tackling the REST orthodoxy. They state very well why REST APIs are not working well for mobile.

Notably: "Fetching complicated object graphs require multiple round trips between the client and server to render single views. For mobile applications operating in variable network conditions, these multiple roundtrips are highly undesirable."

Now, I'm wondering how they manage to make the computation of the responses on the server side no too expensive. It seems clear that there is a risk in such a system to define queries that pull way too more data at once. Also, the question of pagination comes to mind. How can you handle that efficiently?

6 comments

Re: the risk of overfetching, this is certainly a risk. Like any tool, it can be misused. One of the motivations of Relay is in fact this very issue. By coupling the data-fetching with the view more tightly, we can more accurately track and fix overfetching earlier in the development cycle.

In terms of being not too expensive, an important attribute of the system is that the server publishes capabilities that clients selectively use. For example, we explicitly do not allow clients to send up arbitrary strings for filtering and query predicates and what not; servers have to explicitly expose those via arguments to fields. eventMembers(isViewerFriend: true) { ... } or similar formulations that are encoded in the type system. This prevents people from ordering inefficiently (e.g. large data sets without indexes).

Re: pagination. This is absolutely a core pattern that we are excited to talk about as we explain the system more. Broadly we handle pagination through call arguments, e.g. friends(after: $someCursor, first: 10) { ... } There's a lot of subtlety there which I won't go into until we dive into that topic deeply.

Thanks a lot for these insights. I'm definitely looking forward to discover more about all this.
I like how JSON-LD lets one embed many resources inside a document. And there are plenty of good preludes to this recent emergence: Soundcloud wrote about how they used an instance store 3 years ago in their clientside to extract all relevant models, and push them into an "instance store" where anyone else might find the data when they need it. https://developers.soundcloud.com/blog/building-the-next-sou...

What's really exciting is an expressive model that allows code to state what dependencies it has. The transport and fulfillment of needed data- what intermediary stores the data gets to and how pending views are signaled availability- is a more mechanistic, rote, already faced task, albeit one that each company seems to tackle on their own. The step further- declaring and modeling dependencies, is what makes GraphQL an interesting capabilities.

Soundcloud's 3 year old blog post is a good reference to show that "instance stores"- these clientside object database services- have been around for a good while, and can be done fine with REST as easily as without.

It has recently become popular to be anti-REST. It makes you seem smart and more knowledgable than average.

In practice, I think that an GraphQL API is still a form of REST.

People hate on REST when they should be hating on the bad instances of things that others made and said 'But it's REST!', regardless of whether it was or wasn't or that person had RTFM https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding...

The post specifically calls out that distinction:

> We are interested in the typical attributes of systems that self-identify as REST, rather than systems which are formally REST.

It seems to me that GraphQL could be awesome when you exclusively control the back-end and the front-end, but do you think it will work as well if you're building an API that also needs to support third-party clients? Would REST still be better in that scenario?

Edit: I see that they partially address this: "Many of these attributes are linked to the fact that “REST is intended for long-lived network-based applications that span multiple organizations” according to its inventor. This is not a requirement for APIs that serve a client app built within the same organization."

That's a great point. We definitely have designed it for first party clients in mind. This doesn't preclude use cases in the future, but for the short term, this is a nongoal.
In OData, these problems are solved by:

* Client communicate their desired projection and page size (via $select and $top query string parameters), which can then easily be mapped by the service into efficient calls to the underlying data store.

* OData client page sizes are polite requests, not demands. The server is free to apply its own paging limits, which are then communicated back to the client along with the total results count and a URL that can be followed to get the next page of results. Clients are required to accept and process the page of entities they are given, even if that number differs from the count which was requested due to server limits.

I'd assume GraphQL will adopt similar functionality, if it hasn't already.

I think http/2 solves some of of the issues regarding round-trip. One real problem with mobile vs. web is versioning, the inability to ensure that you can keep your client & server in sync.

In some ways one of the biggest advances I see from GraphQL/Relay is that should avoid most of versioning hell for mobile - there's effectively an agreed interop language for communicating data needs, and thus backwards compatibility during API evolution should be far less complicated.

HTTP/2 removes some of the overhead of requests, but there is still the problem of multiple round trips.

For example, if you request your top three friends and their most recent post using REST you'll probably need to do four requests. And you can't parallelize them because you need to know your friends' IDs before you can construct the comment requests.

Actually this can be addressed with HTTP/2 although I think the solution may be just as complicated. If the yet-to-be-known parameters are encoded in the query string then the requests can be pushed to the client before the client knows it will be requesting them. This could be done with a middleware that used the Referer header (and maybe some fancy isomorphism) to determine what should be pushed.
True, but then you have the server duplicating logic from the client. This is very similar to the custom endpoint solution, which breaks down when you have multiple clients needing different data. You end up either under or over fetching.
To prevent the over/under fetching you're describing you could partition your endpoints and make multiple requests. Although, thats definitely a code-maintenance win for GraphQL.

It seems like if you were co-executing the client on the server you could trivially achieve perfect fetching. GraphQL may actually over fetch in many situations. Here's an example: the client fetches a list of objects, filters it, and then fetches more data referenced by the results. With GraphQL, if you don't automagically parse the filter out of the client code, you over-fetch. However, the HTTP/2 solution could just push the 2nd fetch as it was made by the co-executed client.

All that being said, GraphQL certainly alleviates the server-side load co-execution would imply and that's likely more suitable to the scale Facebook operates at.

Yes, that's a tricky problem. Generally GraphQL solves it by filtering on the server. You'd request something like `friends(age: 25, order: dob, first: 10) { name, profilePicture }` and pass that straight through to the UI.

There are some situations where this doesn't work to well. For search suggestions, for example, you might not want to request `searchSuggestion(query: "Tom Smi") { <lots of data> }` on every keystroke because sequential queries will have a lot of duplication. In this case we can just fetch the IDs of the results and do a separate query to fetch the data for the people that we don't know about yet.

Having the server know about client queries (and therefore preemptively running queries) is something we specifically avoided with GraphQL. If the server knows what the client wants then sending any query across the wire doesn't make sense at all. It also falls down if data requirements client changes across client versions. You quickly end up in a place where the server has to know about all possible client queries, which is complex difficult to maintain.