Hacker News new | ask | show | jobs
by YZF 232 days ago
In my day job the question of SQL and its role keeps coming up. Some people want to propagate SQL all the way to clients like web browsers. Perhaps operating over some virtual/abstract data and not the real physical underlying data (that's a whole other layer of complexity). This seems like a bad idea/API in general.

I'm not too familiar with GraphQL but on the surface it seems like another bad idea. Shouldn't you always have some proper API abstraction between your components? My sense for this has been like GraphQL was invented out of the frustration of the frontend team needing to rely on backend teams for adding/changing APIs. But the answer can't be have no APIs?

All that said there might be some situations where your goal is to query raw/tabular data from the client. If that's your application then APIs that enable that can make sense. But most applications are not that.

EDIT: FWIW I do think SQL is pretty good at the job it is designed to do. Trying to replace it seems hard and with unclear value.

6 comments

> All that said there might be some situations where your goal is to query raw/tabular data from the client. If that's your application then APIs that enable that can make sense. But most applications are not that.

IME, the majority of responses sent to the client is tabular data hammered into a JSON tree.

If you generalise all your response to tabular data, that lets you return scalar values (a table of exactly one row and one column), arrays (a table of exactly one row with multiple columns) or actual tables (a table of multiple rows with multiple columns).

The problem comes in when some of the values within those cells are trees themselves, but I suspect that can be solved by having a response contain multiple tables, with pointer-chasing on the client side reconstructing the trees within cells using the other tables in the response.

That would still leave the 1% of responses that actually are trees, though.

Instead of a client dealing with a server that only presents unopinionated, overly-broad CRUD endpoints for core entities/resources, GraphQL is a tool through which the client tricks the server into creating a bespoke viewmodel for it.
But those endpoints are abstractions. Don't we want control over the surface of the API and our abstractions? If you let the client tell the server what the abstractions are in run-time you've just lost control over that interface?

As I was saying, there might be some situations where that's the right thing, but in general it seems you want to have a well controlled layer there the specifies the contract between these pieces.

GraphQL still has schema constraints, the surface of the API you mentioned.
My post was only intended as a commentary regarding how I approach GraphQL after a few forays into it (current stance: would not default to GraphQL, but not against it either).

I was not intending to dodge your questions, but nor was I trying to comprehensively answer them, because they felt a bit unclear. I will make an attempt, combining snippets within your two posts that seem to be related:

>Shouldn't you always have some proper API abstraction between your components?

>But those endpoints are abstractions. Don't we want control over the surface of the API and our abstractions?

I can't answer this unless I know what concepts/layers you are referring to when you say "abstraction between components". If you mean "between the client and server", then yes, and GraphQL does this by way of the schema, types, and resolvers that the server supports, along with the query language itself. The execution is still occurring on the server, and the server still chooses what to implement and support.

If by "abstraction between components" you mean "URL endpoints and HTTP methods" then no, GraphQL chose to not have the abstraction be defined by the URL endpoint. If you use GraphQL, you do so having accepted that the decision point where resources are named is not at the URL or routing level. That doesn't make it not an abstraction, or not "proper" in some way.

>But the answer can't be have no APIs?

I don't understand what you mean by "No APIs"? You also mention "control over the surface"...

Is your concern that, because the client can ask the server "Please only respond with this subset of nodes, edges and properties: _______", the server has "no API"? Or it doesn't have "control"? I assure you that you can implement a server with whatever controls you desire. That doesn't mean it will always be easy, or be organized the way you are used to, or have the same performance profile you are used to, but the server can still implement whatever behavior it wants.

>...in general it seems you want to have a well controlled layer there the specifies the contract between these pieces.

I think this wording brings me closer to understanding your main concern.

First, let me repeat: I am not a big GraphQL fan, and am only explaining my understanding after implementing it on both clients and servers. I am not attempting to convince you this is good, only to explain a GraphQL approach to these matters.

The "well-controlled layer" is the edge between nodes, implemented as resolvers. This was the "aha" moment for me in implementing GraphQL the first time: edges are a first-class concept, not just the nodes/entities. If you try using GraphQL in a small project whose domain model has lots of "ifs" and "buts", you will be forced to reach for that layer of control, and get a sense of it. It is simply located in a different place than you are used to.

This "edges are first-class concepts" has an analogue in proper hypermedia REST APIs, but most organizations don't implement REST that way, so except for the five people who fully implement true HATEOAS, it is mostly beside the point.

  > My sense for this has been like GraphQL was invented out of the frustration of the frontend team needing to rely on backend teams for adding/changing APIs.
GraphQL was borne out of the frustration of backend teams not DOCUMENTING their API changes.

It's no different ideologically from gRPC, OpenAPI, or OData -- except for the ability to select subsets of fields, which not all of those provide.

Just a type-documented API that the server allows clients to introspect and ask for a listing of operations + schema types.

GQL resolvers are the same code that you'd find behind endpoint handlers for REST "POST /users/1", etc

Re: GQL - Explain to me what abstraction layer should exist between the data model and what data is loaded into the client? I’ve never understood why injecting arbitrary complexity on top of the data model is wise.

Perhaps unfettered write access has its problems, and GQL has permissions that handle this issue plenty gracefully, but I don’t see why your data model should be obfuscated from your clients which rely on that data.

In my view the abstraction layer should be in the domain of the application.

Let's say your software is HR software and you can add and remove employees. The abstraction is "Add an employee with these details". The data model should be completely independent of the abstraction. I.e. nobody should care how the model is implemented (even if in practice it's maybe some relational model that's more or less standard). Similarly for querying employees. Queries should not be generic, they should be driven by your application use cases, and presumably the underlying implementation and data model is optimized for those as well.

But I get it the GQL can be that thing in a more generic schema-driven thing. It still feels like a layer where you can inadvertently create the wrong contract. Especially if, as I think the case is, that different teams control the schema and the underlying models/implementation. So what it seems to be saving teams/developers is needing to spell out the exact requirements/implementation details of the API. But don't you want to do that?

How do people end up use GQL in practice? what is the layer below GQL? Is it actually a SQL database?

For instance, while I work in small teams, I’ve relied on Hasura GraphQL-Engine a lot for my api. This is a full GQL API automatically generated from your SQL schema, Postgres being the best supported DB. GQL relations are available across foreign keys (or manual joins which I never use), so a well defined normalized schema can have deeply nested queries executed easily with full type safety for the consumer.

Taking an HR example, you could query for an employee, their PTO status and accrual history, their manager, and their reports all in one nice easy query that no one has to write any business logic for, just a schema set up with employees, manager, reports, and PTO tables joined on ID keys.

And in such a case, what abstraction does the backend team need to put in front of the schema? I can’t motivate what this means myself. A well designed DB schema is truly a beautiful contract, and with table and column comments you can even get intellisense docs in the IDE for the front end team building the client.

On the flip side, I agree the write operation should be done thru an API when there is complexity and requirements beyond just writing one row to one table, but read operations are much more graceful and speedier to define in GQL than REST.

To use SQL effectively a certain amount of training is needed. But people are trained to read and write and do arithmatic. How to understand and write simple relational database queries is a broadly useful skill that should be widely taught in schools.

When it comes to written English, perhaps that could do with some reforms just as with SQL. Yet the way we write remains mostly unchanged.

I hold a very unpopular opinion of GraphQL. I think it’s a great internal querying API. Every web backend project I’ve worked on tries to implement an API for querying data and it’s usually either fast and inflexible or flexible but slow. GraphQL allows to strike a balance, flexible and reasonably fast, with ways to optimise further.
I love GraphQL, it's great. It takes away the ambiguous way to organize REST APIs (don't we all love the endless discussion about which HTTP status code to use...), and at the top level separates operations into query/mutation/subscription instead of trying to segment everything into HTTP keywords. It takes a bunch of decision layers away and that means faster development.
Question is: do you need that flexibility if you have the backend for frontend? Can you design such a flexible api which makes it possible to iterate faster? If not, you just pay, in the best case, a constant overhead, or worst case, exponential overhead for each request! If you need to spend time optimizing because you have monitoring for slow queries or downtime caused by never terminating queries than most likely you’ve already eaten implementation speed advantage - if it exists at all in the first place.
I always thought it was about developer velocity, in this particular case front-end. With a traditional REST API the front-end team needed to coordinate with the back-end team on specific UX features to determine what needed to be done, which was further exasperated when API's needed to be specialized for iPhone vs. Android vs. Web UI.

GraphQL was supposed to help front-end and back-end meet in the middle by letting front-end write specific queries to satisfy specific UX while back-end could still constrain and optimize performance. Front-end could do their work without having to coordinate with back-end, and back-end could focus on more important things than adding fields to some JSON output.

I think it's important to keep this context in mind to appreciate what problem GraphQL is solving.

I think I understand this, possibly nice for huge client x feature matrix. I don’t have XP in the setup where there is a big separate backend team. In my head there is an alternative implementation: have a separate routing layer (coauthored by backend and frontend). Backend responsibility ends with the service layer. There has to be some domain contract implemented somewhere, question is it is simpler to cut down from a tree or build something on top of components.
This is my read of the history as well.

This is also the motivation that would lead me to advocate for adopting GraphQL for a product. Moreso than a technical decision, it is an organizational decision regarding resource trade-offs, and where the highest iteration or code churn is expected to be located.