Hacker News new | ask | show | jobs
by onlydnaq 2422 days ago
When it comes to modern web development I’m more of an observer than a contributor, so my opinion might not carry a lot of weight. However for me all these abstractions seems to get closer and closer to querying a database using SQL directly.

A carefully designed database schema would be able to support all of these use cases in a way that (at least for me) seems a lot simpler than wrapping it in new abstractions. Inserting multiple objects in the same transaction?, already implemented. Updating several fields at the same time as well? Getting only the entities you are interested in together with their subentities?, well that’s what a relational database does.

As I said, I’m not a web developer, and I haven’t touched on use cases where a lot of different things need to happen on server operations, but all the examples in the article would be easily solvable with SQL.

10 comments

GraphQL came from Facebook, where it's used to wrap other services and aggregate the results into a tree of data. I've also used it this way.

As I understand, most of the constraints (writes are done in isolation, you don't have transactions, etc.) are there because they're consistent with an architecture where you can't actually implement those things. Transactions are something that happens further down the chain of services, and a single mutation may or may not be implemented using transactions that may or may not be distributed across multiple services. A mutation could cause information to be sent to message queues, stored in various data stores with different technologies, cached in multiple places, persisted to a data warehouse, or other side effects, and transactions are an implementation detail.

You could use foreign data wrappers in postgres, or some other equivalent, but most of the good bits of SQL would take a lot of work or wouldn't even be possible.

PostgREST is a project that does that automatically for PostgreSQL. It takes your schema of your db and creates an api based on table relationships and provides security through a database design pattern of enforcing row level security. Side effects are challenging. For instance sending an email on account creation requires creating a listener on the postgres notify event but it is interesting to think about even if not 100% practical.

https://postgrest.org/en/v6.0/

I wonder if the people at edgedb (https://edgedb.com/) have the right approach then. They used Postgres as a foundation, and used their experience in SQL to make a DSL that is similarly expressive, but but with less caveats and some helpers for very common use cases.

It's a very young project, and as a heavy ORM using I'm not the best person to judge, but I like the way they think.

I personally think fundamentally what a lot Of people miss in the API standards discussions (debates?) is that GraphQL, JSONApi et al are not just about getting around the single operation per request/call, they are about abstracting your API interfaces always from a tightly coupled schema. I think this because most APIs are built in these circumstances (and this assumes that your endpoints are actually RESTful):

* the schema/Endpoint already exists and it’s being updated to a newer standard (like GraphQL or OpenAPI) and they simply use the schema of the existing data source (usually a database) and these are just coupled to the data source. This gains more efficient operations and possibly documentation and validation (all good things!) but misses another goal (will explain momentarily)

* That an api is being built from scratch and one of these schema based standards is being used as a stand in for the database store. In this case the tight coupling is just backward (your data source will likely be built to match the API)

What I seem to not see is anyone talking serious about how these standards and associated technologies or implementations aren’t being leveraged to create an abstract interface to data points regardless of data source structure or schema. The strength of GraphQL, OpenAPI (formerly Swagger), JSONApi and others isn’t just in pure description and validation, it’s in that it allows you to build descriptive and normalized APIs that can in effect act as middleware over any data source and keeps your data sources independent from API concerns.

The end result being you shouldn’t have to change your API schemas because you changed your data store or how you want to store your data. So you never have to version your APIs. It’s suppose to make your APIs inclusive of change.

You'd have to keep users of your application in sync with database users though. And you wouldn't get complex backend logic out of the box without something like PL/(pg)SQL or database extensions (all of which are a bit unwieldy compared to an HTTP server).
Doesn't work that well when your databases are distributed and you're using microservices. You don't want to be doing join's across microservices and most setups wouldn't allow you to anyways. GraphQL provides that orchestration layer for you.
Let's say it provides more like a standardized common interface for you. You still have to code the glue to make all those data aggregations, cache handling and permission checks.

In fact, because it's such a new tech, and since it's embedded into very few things, you have a lot to write by hands and perf on single servers will suffer compared to regular SQL + Restish.

The concept is interesting for big distributed heterogeneous systems though, especially when you have many clients.

This is exactly what Hive is. https://mapr.com/products/apache-hive/

It provides a full SQL query engine on top of unstructured data so you can assemble results.

Personally I would never expose the raw DB in any capacity to the front end systems for security and sanity reasons. DB schemas need to be free to change all the time and you wouldn't want to break entire services from a column type change.

I think the primary reason not to do this is security—and I don't just mean that letting attackers make Postgres queries directly is a bad idea (it is, of course, but you could imagine other ways to implement this). You want any interface you expose to the public to be simple enough that you can be reasonably confident that none of the messages you accept can cause bad things to happen. SQL is so computationally powerful that you can't really get that assurance.
> You want any interface you expose to the public to be simple enough that you can be reasonably confident that none of the messages you accept can cause bad things to happen.

Yes, that's why you expose a simple set of views as your public interface for each role, and apply the dirt simple RBAC that every multiuser RDBMS supports to limit users to the appropriate set of views.

> SQL is so computationally powerful that you can't really get that assurance.

The computational power of SQL is mostly irrelevant, because the dirt simple security model let you limit database effects, so that all the computational power means is that people can tie up resources with poorly (or maliciously) crafted queries, if you don't apply the simple mitigations every multiuser RDBMS has against those.

People not bothering to learn about the database engines they are using leads to putting (often poor) ad hoc security in the web layer in front of a database that isn't properly secured because the app in front is excessively trusted, making this less, rather than more, secure than if people just learned to understand heir DB and exposed it directly instead of trying to acheive security through cargo cult complexity.

It's possible to enhance database security with defense in depth and and additional layers, but none of that starts with misunderstanding RDBMSs so badly as to think that they are, in an app stack, one of the hard parts to secure.

I am like you as well except the database might be mongo,elasticsearch,etc... Plus processing of the data might need to be done before the client gets it or the response might be based on db data but the fields might not be db mappable because they're dynamic (e.g.: 'baseItemCost' vs 'itemCostAdjusted' adjusted to item availability,demand,delivery cost,etc...)
I completely agree!

I really don't understand why we don't simply ship SQL queries around, maybe a very simple subset of SQL so that it either works on most vendors or that it can be easily manipulated and adapt to different storage backend.

Can somebody enlight me?

it is unlikely that all the data sources you will be querying now are directly available in an SQL database.
For sure they are not available in a /rest or GraphQL engine neither.

SQL should be only the syntax used to communicate with the backend, the real implementation would be in whatever make sense.

That would require you to map the SQL to something else which would then map back to SQL or mongo or w/e. For example if you're trying to join data from multiple microservices you can't execute the join because most microservices setups are not going to allow joins. The joins would have to be mapped to service to service calls and at this point you're just rebuilding an orchestration layer which has already been solved by graphql.
So we invented a query language (GraphQL) to solve an orchestration problem?

Sorry but I don't buy this reason. Especially not to the question: "Why we don't use SQL as a query language?" especially because REST was around way earlier than GraphQL.

Your reasoning could be great to answer the question: "Why we like GraphQL where there are a lot of microservices?"

BTW, there is nothing inherent in the language in itself, if each microservices would be speaking SQL, you would simply need to optmize your SQL query, decompose your optimized SQL query creating simpler queries one for each microservice, send each of those queries to the relative microservices, compute back the final result.

If you replace "microservices" with "table" here you get the standard way an SQL engine works :)

Please, it is not that I am against GraphQL, but it honestly seems like some engineer at facebook was annoyed by SQL, and decide to re-invent it. And that we are all following him or she just because they worked at facebook, which seems unwise.

Maybe the use of types? But what is the difference between a type and something that belong to a table?

Anyway, I am looking now at [GraphQL: The Documentary (Official Release)][1]

[1]: https://www.youtube.com/watch?v=783ccP__No8&feature=youtu.be

> So we invented a query language (GraphQL) to solve an orchestration problem? Sorry but I don't buy this reason.

Exactly. I came here to say the same thing. This seems like an attempt to justify the existence of a thing after-the-fact. And it's a reason which isn't even the stated reason of the project itself!

There is nothing about GraphQL that makes me think someone invented it because they were annoyed by SQL and decided to reinvent it, generally reinvention means a lot of similarity only badly done. You might argue GraphQl is badly done in comparison to SQL, but surely not that it is similar.
Exposing that much of your internal implementation details could be very problematic.
Sorry, I realize my first message wasn't as clear as I wanted.

You would not expose the internal implementation, you would simply accept SQL queries, maybe against a separate view of your SQL database, or maybe against a custom backend.

SQL shuould be the query languange, not the implementation.

That’s exactly what I meant as well. SQL may be the implementation, but if so with a well designed exposed schema, not with the application internals.