Hacker News new | ask | show | jobs
by andrewingram 2690 days ago
The main argument I have against "One Graph", is that it's not that uncommon to have two (or more) quite distinct views of the world.

At my last job, we were building a social shopping app. Behind the scenes, products were versioned so that we could deal with disputes related to attempts to defraud customers. This (along with several other things) meant that the logical internal abstraction of the data model for things like dispute dashboards was considerably more complicated compared to an abstraction that made sense for the clients (apps and website).

If we only had one graph, all the clients developers would have to develop around a data model that was far more unwieldy than they needed. But with two graphs, the world was a lot simpler (at the cost of having to maintain two graphs).

8 comments

The RDF world has conclusively proven that there is more than "One Graph". (e.g. people try to make "One Graph" and their projects die; try to make as many graphs as there are points of view and the sailing is smooth)
> The RDF world

I'll never forgive them for taking a great concept and absolutely beating it to death with intellectual (for lack of a better word) wankery. We could have ubiquitous Datomic style triple stores today if not for the Semantic Web researcher's need to generate pseudo-academic journal articles.

In the case of GraphQL, I'd be interested in seeing strategies for multiple graphs within a single codebase. Essentially being able to produce different schemas based on config. I know that superficially it's as simple as some "if" statements, but I'm curious about the maintenance/scalability side of it.
Many GraphQL libraries take some sort of schema definition and then serve it at a route (eg. /graphql). To support multiple schemas, you'd just write a different definition and serve it at a different route. How you resolve the fields is up to you, but both can use shared underlying business logic in these resolvers.

In terms of maintainability, you have to take care that your changes to the underlying business logic don't break assumptions of each schema. And if you want to evolve one schema (eg. say, deprecate a mutation argument, rename a field and deprecate the old naming), you have to ensure that your underlying business logic is backwards compatible for any other schemas (and their clients) relying on it.

How about having N base graphs and the ability to make unions of the graphs, as well as other kinds of algebra.

It is relatively easy to do this in the RDF world since the graph is composed of individual facts which may or may not be in a particular graph.

Which triplestore are you using that gives great front-end performance at load?
None of them are really "great". I get acceptable results with OpenLink Virtuoso if I give it a lot of RAM, tweak the configuration, and baby it when needed.
Is there any reason you couldn't have multiple graphs essentially overlaid on top of each other? With proper tooling you could expose different subsets of the same schema in different scenarios, and still have one unified graph underneath.
See my reply to another comment. But yeah, there’s no fundamental reason. But it’s conceivable that the difference goes beyond simple subsets of fields, but entire relationships being hidden.

Let’s say in your internal model there’s a relationship A - B - C. But for client apps it makes no sense to expose B, so instead you choose to represent the model as A - C. This is more than just a simple subset. Someone who understands graph theory better than I may be able to explain if there are elegant approaches to this.

Since GraphQL lets you select a subset of attributes, there is no reason you can't expose C on A directly, as well as exposing the indirection through B. Redundant attributes are A-OK.

You'd have something like this::

a { b { c { x } }

as well as

a { c { x } }

or even:

a { cX }

Agree. And congrats on the round.
Thanks!
I agree. I think their recommendation is a bit overzealous.

I can see the argument if you have a web frontend that consumes data from multiple backend services – have one GraphQL service that manages them all instead of a GraphQL layer on each service.

But this breaks down greatly when you have different "Viewers". In a web app, the "Viewer" can be a logged in user. In an admin dashboard, the "Viewer" is very different – an employee acting on behalf of users. Service to service communication likely doesn't have a concept of a "Viewer".

I would propose that you have different schemas when you have these different views of the world or different permission boundaries. The business logic can be shared – you may just enforce different authorization checks at the GraphQL layer. You could also share GraphQL types that are common between schemas.

From my (naive) understanding it seems that one graph would not exactly be right, one graph per bounded context might be more likely. I think one graph might fail when one entity means something different for different clients. Imagine a SAAS system where customer means any currently paying user for the main saas, any prospective or current user for marketing and any enterprise user that has signed a deal or wants to sign one for enterprise sales. There's just no way to map those three into a single customer type. Similarly, in a school management system, a student, with all his personal details, might be one entity for a school nurse or the dean, but n different entities for a teacher who teaches that one student n things. How would you map that to one graph?
Agreed. We decided to go with 2 views, an internal and an external. Currently the only GraphQL clients we have are customer facing so we just don’t model any internal details in the API. I expect we’ll introduce an internal one at some point.

Not only do these end up with very different data models, but they are also likely to have different access control (customer facing is basically all open, scoped to user, internal has many different layers and permissions), it’s likely to have different performance concerns, reliability, etc. That’s a lot of complexity you don’t need slowing you down when you’re building the other graphs.

I think they mean it technically only.

You have one root, but mutiple nodes after that, every one being essentially another graph.

That's the general argument for key value storage versus normalized relational databases as well.
Can you elaborate on that?
On a second reflection I may have oversimplified your point.

There's this trade off between having logically structured databases and having data stores that are faster to access. An all-too-superficial scan saw your point as just an iteration of that, which it may not be.

The word "graph" itself might have a different meaning than in mathematics. Of course something like wikidata is a directed multigraph (or a tuple of incidence relations on the same nodes). Still I was under the impression that you were talking about optimizing data stores for access at the cost of having to explicitly maintain consistence in your code and not be able to rely on the database properties themselves -- like what happens when you move from SQL to Mongo.

Which graph technology were you using, and were the graphs maintained separately or via some kind of syncing?