No-SQL databases are glorified caches

Y	Hacker News new \| ask \| show \| jobs

	No-SQL databases are glorified caches (hernantz.github.io)
	45 points by hernantz 1878 days ago

5 comments

vaughan 1878 days ago

It's surprising that graph dbs aren't more popular.

Just as document dbs can be derived/denormalized from SQL dbs, relational dbs can be derived from a graph.

Conceptually, data is a graph.

I always find the decision between 1-M and M-M is so sticky with RDBMS, and with a graph, it can be whatever you want it to be.

link

tuatoru 1877 days ago

They were tried in the past, culminating in the CODASYL data model [1]. The relational model was a vast improvement in reliability and maintainability then, and it still is now.

1. https://en.wikipedia.org/wiki/CODASYL

link

tluyben2 1878 days ago

I have used 'graph dbs' (or stuff bolted onto something else exposing a graph and/or being called a graph-db), mostly commercial ones, for the past 20+ years because I have the same feeling as you have; every one of them was too slow (and not scalable but we didn't even get to that point). From absolutely unusable to useable as a toy; one of them was 50k$/server and it was a toy. But that was a long time ago; things moved on and I hear good things about Dgraph.

So I will try it again, see if it works this time around.

link

daemonk 1878 days ago

The flexibility of a graphdb translates to flexibility in writing your queries. And that's my main problem with graphdb. Query optimizations can be difficult. It is very easy to write a query that logically does what you want, but takes hours to run. And if you take a bit of time thinking about how the query runs, you can optimize to run in milliseconds.

link

mrjn 1877 days ago

[Author of Dgraph]

> Query optimizations can be difficult.

I don't think they're any more difficult than SQL really. In fact, with Dgraph we can avoid scans, where SQL has to scan for most of the queries.

In fact, we're aiming to work on query optimization in depth starting mid-May. So, perhaps in a few months, this would be a topic worth writing a blog post about.

link

daemonk 1877 days ago

Thanks for the response. I look forward to it. I've used both Dgraph and RedisGraph for some toy projects previously. I really think that graph dbs are the way to go for storing biological/health data which I am involved in. They naturally model this type of data very well.

The query optimization issue seems more to do with how the data is structured than perhaps the query engine? For example, if I wanted to query for all node type X, starting from node type A, it can potentially take a very very long time. But if I add a few known constraints like query all node type X, starting from node type A that paths through node B, C, D. It quickly gets much faster. I guess it's really more up the user to optimize the queries in these cases.

And the flexibility that graphdbs afford to the user makes it hard to realize that perhaps there are these natural constraints that can be used for this query. So it seems like there is definitely a graph analysis component to this that gives you the insight to write the queries faster.

I guess this problem also only applies to natural world problems that you are trying to force into a graph structure rather than a pre-designed database.

link

mrjn 1877 days ago

The thing about Dgraph (and potentially other graph DBs) is that it knows the "cardinality" of the relationship, because of the way it stores data. So, it can make judgement calls on the best path to execute a query.

I might have a better write-up about this in a quarter or so.

link

TechBro8615 1878 days ago

What do you think of GraphQL? I think GQL + multiple backend relational databases can be a nice middle ground. It means that you can optimize each relational DB separately, and still benefit from querying the graph at a higher level. (If needed, some of the backends can even be graph databases.)

It’s worth noting that GQL comes from Facebook, where they spend quite a bit of time operating with graph data.

link

fizx 1878 days ago

GQL isn't really much like a graphdb (e.g. SPARQL).

The goal of GQL is to gather a useful subset of your microservice data into a single tree for your frontend to render. In a facebook context, a typical query is "give me all the info i need to be able to show my home feed".

The goal of graph dbs is to allow arbitrary, complexm, often cyclic and aggregating queries to be run against a data source expressed as a graph. A typical query is something like "how many of my friend-of-friends like basketball, displayed by location."

link

zozbot234 1877 days ago

General-purpose graph databases have been kinda obsoleted, now that SQL can express recursive queries in a standard syntax. There's still a case for very specialized datastores that are optimized for running relatively complex algorithms on stored network/graph data, but for anything simpler you're going to get quite good performance from a standard RDBMS.

link

mrjn 1877 days ago

[Author of Dgraph] Do try it out. I don't consider Dgraph to be a "NoSQL" DB. Typical NoSQL DBs are document DBs. Dgraph is unique in that it is able to use graph to deconstruct a JSON document, and re-construct it at query time. Your document could be anything you'd want at query time -- and that's really powerful. So, give it a shot.

link

slifin 1877 days ago

These databases are really prevalent in the Clojure community

- Datomic

- Crux

- Datahike

- DataScript

- Datalevin

Some of them running in the browser, which power Roam Research and its clones

- Athens

- Logseq

- Obsidian

link

noofen 1878 days ago

RAM is a cache for the disk. Disk is a cache for the network.

link

rwoerz 1877 days ago

So, in principle you could rid of all RAM and disks an keep everything in wires?

link

spurdoman77 1878 days ago

Cpu registers are cache for...

link

hernantz 1877 days ago

RAM

link

nine_k 1878 days ago

Caches are fleeting. Databases are durable. This is one distinction. Caches return a value by association. Databases usually allow for range and aggregate operations on many values. This is another distinction.

Also, "no-SQL databases" is like "non-green colors"; it encompasses a much larger spectrum than it excludes. Putting graph databases, local KV stores, distributed KV stores, document stores, time-series stores, etc in the same basket just because they are not RDBMSes is not very productive.

link

tshaddox 1877 days ago

Is there a term for derived denormalized data that is stored separately for very fast retrieval (so far that’s basically the definition of “cache”) and must exist for the software to function? That last part makes it distinct from (or at least a special case of) a cache. This comes up all the time in application design. A basic example would be an activity feed in a social networking app. You probably want to show all recent events which are stored in many different tables, e.g. posts, comments, likes, friend requests, etc. but you probably also need to denormalize that data because a big SQL union or join across every table that represents an event is probably not possible to do on demand.

link

solipsism 1877 days ago

It's called a materialized view.

link

tshaddox 1877 days ago

That’s one implementation of a similar idea in RDBMSs, although they generally require manually refreshing them when desired. I think I’ve heard that some RDBMSs also allow you to apply normal inserts and updates to materialized views if you want to manually keep them up to date as well, although I’ve never tried that approach.

link

vaughan 1877 days ago

It’s called incremental view maintenance.

Check out: https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...

link

solipsism 1877 days ago

Manually refreshing them? How could such a process possibly be manual but not automatable? Does it involve turning a crank?

link

j16sdiz 1878 days ago

I think the no-sql vs sql war have ended already. Most of us now know what they can or cannot do.

Nothing new or interesting in this article.

link

FractalHQ 1878 days ago

What about the guy that decided to learn about databases 5 minutes ago? He doesn’t know what they can or cannot do.

link

myrryr 1878 days ago

I think this article won't tell him though, and that is a problem.

This person thinks they are glorified caches, but they miss what they are good at, REALLY fast aggregation across many servers, that isn't a glorified cache, that is something else.

link

hernantz 1878 days ago

You would be suprised to find out that mongodb is still popular for the wrong reasons

link

myrryr 1878 days ago

It is also popular for the right reasons too. Sometimes you want to aggregate large datasets incredibly fast.

Sometimes you want user defined queries which are easy to restrict to parts of a dataset by rules.

If you don't have a use case for something, it doesn't mean no one else does.

link

MapleWalnut 1877 days ago

Which NoSQL store are you talking about regarding fast aggregations? I don’t think that’s a property of all NoSQL dbs.

link

stevefan1999 1877 days ago

and caches are glorified LRU based, key-value associated data structure with persistent storage to track data states

link