Hacker News new | ask | show | jobs
by tormeh 3700 days ago
How does PostgreSQL compare to VoltDB?

I'm trying to get a handle on the different databases, and VoltDB sounds exciting, but everyone's talking about PostgreSQL. Then there's Mnesia which I hear is, as all things Erlang, excellent, though it's kinda tied to Erlang.

I know it's hard to say what's best, but what would you say is the best DB for a completely new multilingual project that needs throughput but prioritizes low latency, for example?

Also, VoltDB is licensed under AGPL. Does this mean that it can't be used in commercial projects? Or is it OK as long as the other components are on different servers or similar?

5 comments

You almost certainly want Postgres, unless you have a compelling, specific use case to use a database optimised for a specific workload, or for some reason Postgres isn't usable in your specific environment. It's a reliable, well-designed general-purpose RDBMS which will scale up pretty well to cope with fairly large workloads; performance and latency will not be problems with a sensible schema.

VoltDB, as an example, is very different: it seems to be designed for simple OLTP workloads. It's an in-memory database, which offers opportunities for impressive performance, but if you have a large amount of data, you'll need a large amount of memory. And horizontal scaling is cool, but cross-partition operations will incur significant overhead

(IANAL, but the AGPL requires network users of software be able to download the source. Since a presumably proprietary application is the client in this case, this isn't likely to be an issue.)

> How does PostgreSQL compare to VoltDB?

If you don't know the difference, you probably want Postgres.

VoltDB is a specialty database for things like high frequency trading. It wouldn't make sense to use for, say, a consumer app or web startup.

Yeah, if I start a new project, I default to Postgres: https://journal.dedasys.com/2015/02/21/i-default-to-postgres...

I'd consider something else if I'm really, really sure that it's better suited to whatever niche problem than Postgres, but it'd take a lot of thinking and convincing myself.

In specific, it is a column store, which is advantageous to do things like real time analytics over millions of data points via streaming market data. This has uses for HFT, but also for anyone who wants to do their own day trading.
VoltDB isn't a column store. It's designed for serializable OLTP workloads with really fast index updates, neither of which characterize column stores. You may be thinking of one of Stonebraker's other projects, Vertica.
Gah, you nailed it. Sorry about that. Right guy, wrong db project that starts with a V.
WhatsApp I hear uses Mnesia. If you use Erlang and can fit everything in memory that does look pretty nice. It integrates right into the language.
It also came about before CAP was postulated, and does nothing to automatically resolve partitions. It doesn't -die-, but it was clearly built with a "we'll run this in our own data center" mentality. In the event of a partition, it goes immediately to split brain mode, which each side of the partition running separately, allowing both writes and reads, and doing nothing to try and automatically heal the partition, even if your Erlang nodes reconnect.

This isn't necessarily a bad thing; it's a very easy model to reason about, it rarely has an issue if your cluster doesn't span more than one data center, and it lets you know when it happens via events you can subscribe to. But doing anything different, up to and including healing the partition automatically, is left to the user.

There's also a few warts due to its history; disc only tables have a rather small max size (so if you're not expecting everything to be stored in RAM as well, you don't want to use Mnesia), indexes are shockingly inefficient for writes, and a few other odds and ends that I don't really remember. For persisted, but fast, storage in Erlang, where partitions aren't common, it's great, but outside of those sorts of use cases there's probably something better.

Mongodb is also in agpl. But the drivers probably aren't. So you can use it and be fine. And if you make changes to VoltDb you have to share them.

11.5 How much data can be stored in Mnesia?

Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

> Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

Ehhhhhhhh, kinda, but not really. A couple of things: [0]

* If you use a disc_only_copies table, your max table size is ~2GB because that's apparently the largest file that DETS will work with. What's more, if your table grows to ~2GB, future writes (to that table) will silently fail until the table size is reduced.(!!!) You can kinda get around this limitation by sharding your table [1], but the hash used to determine what key goes to which shard isn't very balanced... sometimes you luck out and your shards are pretty much all the same size. Other times you end up with a few outsized shards.

* However! Everyone on the erlang-questions mailing list says that disc_copies and ram_copies tables are _NOT_ subject to this limit, so they can grow arbitrarily large.

* And, the general consensus seems to be that you really don't want to be using Mnesia in disc_only_copies mode... if you have too much data to fit into RAM, you should probably consider using something else to store your data. This isn't to say that using Mnesia in disc_only mode is bad or hazardous, [2] but that it's substantially slower than disc_copies or ram_copies, and you run the risk of bumping up against the DETS file size limitation.

[0] All observations valid for Erlang 17->18 only. Check Erlang release notes to see if major changes have occurred.

[1] Mnesia handles sharded tables really well, even if the documentation for the feature isn't the best.

[2] I use it in disc_only mode for one of my projects... it's how I learned about that mode's limitations. :p

>32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb

Do they mean 4GB? Surely 0.5GB/4Gb is a bit small even for 32bit?

> Do they mean 4GB [rather than 4Gb]?

"No one" measures sizes that aren't network throughput numbers in bits. "Everyone" uses bytes. :)

And I mean -honestly- if you were shooting for the Pedant badge, you should have also quibbled about GB vs GiB. ;)

I'm always amused by how people find the "quibble about GB vs. GiB". The difference between a GB and a GiB is 7.3%, and it gets worse for the ever more common larger prefixes (12.6% already for a pebibyte). Might be my background in physics, though.
It's easier to think of Mnesia as more of a persistent distributed hash map rather than a full fledged database.