Hacker News new | ask | show | jobs
by inkyoto 1288 days ago
I don't understand where this comes from. If the problem warrants the use of a graph data model, property graph databases provide an efficient solution for that. Graph databases also excel at discovering distant relationships between loosely coupled data entities and deriving previously unknown facts about the data that would be otherwise too cumbersome to unravel using document or relational database queries. The graph databases also easily allow one who knows the answer to a question to arrive at a set of of one or multiple original questions that have yielded the single answer; it is somewhat niche although is incredibly useful in knowledge graph scenarios.

Just like a document database is not a good fit for a data model with inherent relations between data entities (simple or complex; the reverse is also true), a graph database is not a butt plug for every butt. Every problem requires an appropriately fitting solution for it.

4 comments

> If the problem warrants the use of a graph data model

I think this is the crux of the problem. I once worked on MegaBank's peer to peer payment app, where somebody had figured that the people sending money to each other was a directed graph, so they should use a graph DB to store it. And when Azure's sales team convinced them that CosmosDB could handle relational data and graphs and documents, they bought it hook, line and sinker.

Needless to say, this was a terrible idea: an RDBMS could have handled it just fine, and because everything else was stored in an RDBMS (which despite the marketing fluff is quite different internally in CosmosDB), now doing any kind of join was a huge pain in the ass. As a cherry on top, they were now locked into CosmosDB, which has completely incomprehensible ("request units per second") but very, very high pricing particularly for graphs. Whee!

Oh. Payments, wanks (and Megawanks) and CosmosDB – a unholy trinity, bless them all. I think I know what the Megawank was up to.

Since Apple (not entirely sure about Google/Android) denies anyone direct access to the NFC hardware in an iDevice, and presents a (mostly) anonymised unique payment token to the wank instead, reconstituting a people connection graph via tracking the fund movement across card accounts poses a challenge. Tracking the fund movement across conventional wank accounts is easier.

But, if the graph data model is devised correctly, it is still possible to incrementally build it out into a rich graph outlining social and material world connections for a given customer either for product placement or nefarious purposes (wanks do sell the transaction history and more to external parties such as Equifax without obtaining the customer's consent). Akin to a Facebook social graph. A gradual graph build-out is, in fact, a great feature of graph databases – a fluid «schema» (for the lack of a better term) that can evolve incrementally in place as new facts about the data become known, without causing a disruption to a production system. If the overall design is sound.

The problem is that traditional wanks are not well poised when it comes to technology related matters due to technology… not being their core competency. Quite the opposite, they see tech as a liability as such projects are driven by financially competent, somewhat business competent but entirely technically incompetent folks. Therefore such projects nearly always fail, technology is blamed in the end, and the CEO/CFO draws approriate (typically, inappropriate) conclusions. So the Megawank in question likely tried to shoehorn a poorly designed graph data model (more likely, an existing relational model) using an A.M.A.Z.I.N.G! multimodel! graph! CosmoDB database whilst being clueless about what they were trying to do. Of course, Microsoft sales people, unwittingly slouching nearby, were singing melliferous songs joy and delight reciting telltale stories of CosmosDB. Profit.

Neobanks, on the other hand, are driven entirely by technologists, and they can pull off such a feat easily or more easily.

I have worked in a Maps company - with the core business model being literally - building maps and adding traffic services. The sort of thing that's provided as the Graph 101 example.

I can assure you, nobody used any Graph database to achieve any of it.

> […] nobody used any […]

With all due respect I don't know what to make out of absolutist, generalised statements such as this one.

GP made a statement about the entirety of the team that they were familiar with. It's the same way that I can tell you with decent certainty that nobody's using MongoDB at my place since `kubectl get pods -A | grep -c mongo` prints 0.

EDIT: Okay, joke's on me there. It turns out the automated frontend tests use a Mongo for some reason. :)

It was more of an observation by looking at most production systems, reading tons of docs, talking to a LOT of people who built the legacy - the stuff that's on your phone and in your car right now - and building a few modern systems over the course of ~2 years.

So a bit more than `kubectl get pods -A` :)

> Graph databases also excel at discovering distant relationships between loosely coupled data entities and deriving previously unknown facts about the data that would be otherwise too cumbersome to unravel using document or relational database queries.

You can do all of this in the relational model, with the new support for recursive CTE's that now enables arbitrary queries to be performed. Even seamless inference of "additional" data points (often given as a unique selling point of "semantic" solutions!) is just a view, plus indexes on the underlying query if you want it to be fast.

> You can do all of this in the relational model […]

That… depends. Right now I am dealing with the customer's 6NF relational model which I had up until now thought was purely theoretical and was not a naturally occurring phenomena (my previous record was a 4NF some years back). ERD's for core data entities span several screens across and are, in fact, a thing of beauty, but… It is difficult to reason about the data that has historically grown over the last 20 years. The incoming data model is a document data model, therefore dependencies between data entities in such a highly normalised model need to be analysed first. A graph database turns out to be a good fit for the data relationship analysis in highly normalised relational data models as well since normalised entities effortlessly map onto graph nodes and relations onto graph edges. The document data model can be thrown into the mix on top of the relational model with new relationships being incrementally added, linking the relational and document data model entities together. Ad-hoc queries are also much simpler in the graph DB as less interesting relations can simply be ignored for a moment. In the end, graph nodes and/or edges can be enriched with extra useful properties giving one nearly complete data model migration mappings.

There is nothing with the relational model, and it is an appropriate fit for many use cases. However, if one has never gone beyond joining 3 tables in a single query for a web app, or if one has never encountered a extremelly normalised data model, it is difficult to see where the relational model falls short. Relational are also rigid and do not accommodate changes easily whereas graph models allow for new relationships to be added incrementally as the data model evolves. Also not to mention that most relational models do not venture past the 2NF model, and the dataset is typically an entangled mess of organic or historical growth.

Modern databases give you tools to evolve and refactor a relational schema over time. Views can be such a tool, Postgres also has transactional DDL changes.
Tooling has nothing to do with the schema evolution, normalised forms (4NF, 5NF, …) do. The trouble is that almost no-one does that. Relying on the tooling alone is either a self-delusion or the lack of experience. Usually both.
It adds complexity to a narrow use case.

If it wouldn't be narrow neo4j wouldn't need to lay off stuff.

Your examples do not refute this

> It adds complexity to a narrow use case.

It also simplifies the unnecessary complexity in many cases, and I have witnessed both. Just like one should not use an expensive Zeiss microscope to hammer nails into a concrete wall as a hammer substitute, one perhaps ought not to stick a graph database everywhere where it does not belong. Engineering (including software) is about selecting the appropriate tooling for each job.

> If it wouldn't be narrow neo4j wouldn't need to lay off stuff.

I fail to see how the two are related. If a company struggles with the execution of their incumbent business model, perhaps it is not necessarily related to the product (may or may not be though)?