Hacker News new | ask | show | jobs
by softwaredoug 1832 days ago
> By 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the enterprise.”

What is behind the thought that graph databases are going to grow so much in the next few years? To me they've always had a niche use... Are they really going to be ubiquitous (like this funding seems to assume?)

4 comments

Historically, graph databases did a passable job of supporting data models and queries that were not really possible in SQL (absent proprietary, vendor-specific extensions). That's all over now, because recent versions of SQL support recursive queries that can handle general graphs quite easily. No real need for a specialized solution, even plain vanilla Postgres is good to go.
Sql syntax for such queries is super awkward with tones of gottyas, not sure how performance compares
There's admittedly some ongoing work on extending the SQL standard syntax with some extra sugar for "Property Graph Query". But there's nothing wrong technically with just using basic SQL syntax, it's just a matter of getting it to work. Performance will vary depending on query optimizer, any INDEX definitions, etc. and is quite a separate concern.

Overall, graph databases are so general as a model that writing slow queries will always be a possibility, so one should be mindful of these concerns. But that's just as true in NoSQL graph db's.

Something I just found out after looking into status updates on the Property Graph Query (PGQ) work being done in SQL, is that it will exactly mirror the work going into GQL (Graph Query Language, a newish standard in its early stages of development based mostly off of Neo4j's Cypher).

To summarize this post[0] by someone involved with the standards:

- GQL (ISO/IEC 39075) is a full database language to create and manage property graphs and create, read, update, and delete nodes and edges (or vertices and relationships)

- SQL/PGQ (ISO/IEC 9075-16) is a new add-on part of the SQL standards which introduces the capabilities to create property graph views on top of existing tables in an SQL database, as well as the ability to query property graphs using a GRAPH_TABLE function in an SQL FROM clause

- The input to the SQL/PGQ GRAPH_TABLE function is a property graph query, sometimes referred to as Graph Pattern Matching or GPM. Graph Pattern Matching is common between SQL/PGQ and GQL. That is, the syntax accepted in a GRAPH_TABLE function in an SQL FROM clause is identical to the syntax in a GQL graph query. Because GPM is the same in both draft standards, changes to GPM for SQL/PGQ also apply to the GPM portions of the GQL specification.

---

I also just came across the Apache AGE project[1] which basically allows you right now to extend PostgreSQL DBs with property graph capabilities and enables full(?) use of Cypher/GQL.

[0] http://www.jcc.com/resources/jcc-blogs-menu/blog-database-dr...

[1] https://age.incubator.apache.org/

Also, nested sets and materialized paths have been around forever to do graphs inside SQL.
At the end of the day Neo4J needs to operate a query planner on top of a relatively standard index structure to present the graph abstraction. There is limited difference between Neo4j's planner and what could be planned from SQL.

GraphDBs make more sense when there is strong evidence that either the natural description of the program is a graph or that the underlying storage engine can efficiently model the graph.

So far no GraphDB has demonstrated either statements as true for the majority of problems.

Yeah, but it's free and open source.
Neo4j is all-in on, "almost everything looks like (or can be made to look like) a graph, so almost everyone should be using a graph database".

As for those specific figures, I'm guessing there's enough wiggle room in "data and analytics innovations" (emphasis mine) to find or project almost any trend one wishes. What are data analytics innovations? Why, it's the set of things that will see 80% use of graph technologies! "Graph technologies" is also so potentially-vague that it could plausibly be 100% of almost anything related to software.

"Everything looks like a graph" is more damning of the idea of a graph as storage than it is praise. The whole point of a database is to impose _additional_ constraints on the data to ease subsequent application development or data analysis.

Relational data may be a hassle but its a hassle you end up having to deal with anyway at some point.

I can see a graph database as being a useful place to stash a ton of shitty data as an initial place to start an ETL but I can't imagine using it as a system of record except in very limited situations.

The additional constraints are also what enable performance optimizations. And not the small ones, the ones that give orders of magnitude improvements. Whereas right now neo4j is slower for graphs than postgres, just with a nicer UI.
Oh, I agree that, baring some actual honest-to-god innovation, the whole product category's niche-by-nature. Just relating the way Neo4j's been positioning themselves.
the point is that "to ease subsequent application development or data analysis" can be done just as well, or better, by a graph DB. You don't have to end up with the hassle of relational data as in an RDBMS.
If a good enough engine comes along, I would agree with those speculations. Many times I've wished my SQL or Mongo databases had graph functionality.
Those investors will never see that money again