| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by slashdev 4127 days ago

Immutability is hardly a cure-all, see the discussion here for why RethinkDB moved away from it: http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-g...

The reality is shared, mutable state is the most efficient way of working with memory-sized data. People can rant and rave all they want about the benefits of immutability vs mutability, but at the end of the day, if performance is important to you, you'd be best to ignore them.

Actually, to be more honest, reality is more complicated still. MVCC that many databases use to get ACID semantics over a shared mutable dataset is really a combination of mutable and immutable.

4 comments

coffeemug 4127 days ago

Slava @ rethink here.

This is a really interesting subject -- I should do a talk/blog post about this at some point. Here is a quick summary.

RethinkDB's storage engine heavily relies on the notion of immutability/append-only. We never modify blocks of data in place on disk -- all changes are recorded in new blocks. We have a concurrent, incremental compaction algorithm that goes through the old blocks, frees the ones that are outdated, and moves things around when some blocks have mostly garbage.

The system is very fast and rock solid. But...

Getting a storage engine like that to production state is an enormous amount of work and takes a very long time. Rethink's storage engine is really a work of art -- I consider it a marvel of engineering, and I don't mean that as a compliment. If we were starting from scratch, I don't think we'd use this design again. It's great now, but I'm not sure if all the work we put into it was ultimately worth the effort.

boredandroid 4127 days ago

I really think there are a couple of levels of immutability that it is easy to conflate.

Specifically immutability for

1. In memory data structures...this is the contention of the functional programming people.

2. Persistent data stores. This is the lsm style of data structure that substitutes linear writes and compaction for buffered in-place mutation.

3. Distributed system internals--this is a log-centric, "state machine replication" style of data flow between nodes. This is a classic approach in distributed databases, and present in systems like PNUTs.

4. Company-wide data integration and processing around streams of immutable records between systems. This is what I have argued for (http://engineering.linkedin.com/distributed-systems/log-what...) and I think Martin is mostly talking about.

There are a lot of analogies between these but they aren't the same. Success of one of these things doesn't really imply success for any of the others. Functional programming could lose and log-structured data stores could win or vice versa. Pat Helland has made an across the board call for immutability (http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf), but that remains a pretty strong assertion. So it is worth being specific about which level you are thinking about.

For my part I am pretty bullish about stream processing and data flow between systems being built around a log or stream of immutable records as the foundational abstraction. But whether those systems internally are built in functional languages, use lsm style data layout on disk is kind of an implementation detail. From my point of view immutability is a lot more helpful in the large than in the small--I have never found small imperative for loops particularly hard to read, but process-wide mutable state is a big pain, and undisciplined dataflow between disparate systems, caches, and applications at the company level can be a real disaster.

eloff 4127 days ago

Excellent points, yes it's important to clarify what we're talking about here. Samza sounds like an event-sourcing style immutable event log. You could think of it like the transaction or replication log of a traditional database. Having that be immutable is very sensible! But you can't always query that in "real-time".

On the other hand, the data structures you query in real-time, making that immutable is problematic, because then you'll need a LevelDB style compaction step. That doesn't mean to say that it can't be done well, but that it's hard to do well.

hyc_symas 4127 days ago

LMDB does ACID MVCC using copy-on-write with no garbage collection or compaction needed. It delivers consistent, deterministic write performance with no pauses. It is actually now in use in a number of soft realtime systems.

eloff 4127 days ago

I was specifically thinking of LMDB as a counter-example when I wrote that it's not impossible, just hard to do well. A much more sensible set of tradeoffs than LevelDB.

coffeemug 4127 days ago

> I really think there are a couple of levels of immutability that it is easy to conflate.

Complete agree. I was talking about immutability on the storage engine level. Totally different tradeoffs apply at different levels in the stack (that you described).

eloff 4127 days ago

It's that merge (edit: GC is a better term) step that's difficult to get right. Google screwed this up badly with LevelDB which had(still has?) horrible performance issues caused by compaction. Even with concurrent compaction it can be difficult due to needing additional disk space, adding additional read and write pressure to the storage subsystem and the effects that has on latency. I'm not sure what RethinkDB's approach was there, but I'm very curious to know.

coffeemug 4127 days ago

> Even with concurrent compaction it can be difficult due to needing additional disk space, adding additional read and write pressure to the storage subsystem and the effects that has on latency. I'm not sure what RethinkDB's approach was there, but I'm very curious to know.

Yes, we ran into all these issues with the RethinkDB storage engine. Unfortunately I can't summarize the solution, because there are no silver bullets. It took a long time to perfect the engine, and there was enormous amount of tuning work to get everything right.

For example, we have a "young blocks" subsystem that treats recently updated blocks differently (since, empirically, recently written blocks are dramatically more likely to be updated again, so we hold off on trying to collect them). How long should you wait? How many young blocks should you consider?

Working out solutions to these questions takes a lot of trial and error, and that's where the bulk of the work is (and that's just one subsystem!)

I'd love to write about it in depth, I'll try to make it a priority.

pas 4127 days ago

How much similarity is there between the JVM's G1GC, CMS or other collectors and Rethink's compaction? It looks like the heuristics and trade-offs are very much the same. (Latency, hard space constraint, usage patterns.) Okay, you don't have to do pointer/object graph chasing, but queries and consistency and whatnot has to do something similar.

coffeemug 4127 days ago

Remember -- this is an on-disk compactor, so it's not quite the same as collecting garbage in memory. There are other differences -- database dependencies are typically trees (or in the worst case DAGs, as there aren't any circular references). So our compactor can be much simpler (in fact, it's closer to a purely functional collector like Haskell's).

But overall it's very similar to a programming language GC. The devil, as usual, is in the details.

karterk 4127 days ago

In another thread, you had mentioned that "I don't think we'd use this design again". Curious to hear how you would design such a system differently, without using a LSM tree approach that's popular these days?

eloff 4127 days ago

If you do, please ping me at dan.eloff @ populargooglemailservice.com, I don't want to miss reading that one!

phunge 4127 days ago

Please do -- I've been reading the linkedin/confluent/samza writeups and thinking there's a lot of truth to their ideas. It'd be great to hear more on-the-ground experience from a different perspective.

elliptic 4127 days ago

Is this not how most 'modern' (read 90s) relational db's work?

eloff 4127 days ago

It's similar, they use MVCC, which means no in-place updates or deletes. Postgres then has a compaction step called vacuum to clean up old tuples. Redis is one of the few "databases" that truly uses in-place updates, but it has that luxury because it's single-threaded.

jdf 4127 days ago

MVCC doesn't necessarily mean no in-place updates, it just means that you can distinguish between multiple versions. For example, Oracle:

- keep most recent version of all keys in B-tree

- store updates in undo log ("rollback segments")

- queries for older versions dynamically undo recent changes

http://docs.oracle.com/cd/B19306_01/server.102/b14220/consis...

eloff 4127 days ago

Good point, but do they seriously do that? It's stupid.

If you overwrite data in place that's being concurrently read, you get garbled data. So you must guarantee nobody is reading it. One way is to lock the data for both readers and writers using a mutex of some form. Another way is Linux-RCU style[1]. Both make readers always pay a price for what should be an uncommon case.

It makes more sense to me to put your updates in a new place, and if need be copy them over the old data once nobody can see the old data anymore.

[1] http://lwn.net/Articles/262464/

amelius 4127 days ago

Isn't that just called "journaling", as opposed to "immutable"?

coffeemug 4127 days ago

Journaling typically implies that there is a separate log of operations/changes, but the main data file (the BTree) is still updated in-place. You can then use the journal to roll back the changes if necessary.

RethinkDB's storage engine doesn't have a journal -- the main data file is essentially journaled, which is quite different from the traditional meaning of the word.

whatthemick 4127 days ago

Everything obviously has trade offs, no choice is perfect in everyway.

For me the pros of having data as an immutable stream of events (eventsourcing) is that you get migrations and data modeling for free - You don't have to deal with having to design the "perfect" data model in advance (or worry about schema/data migrations later on) and you can get caching as first level data rather as derived from another store.

sitkack 4127 days ago

Actually @eloff and the OP are arguing past each other. Both are wrong in different ways.

> Databases are global, shared, mutable state. That’s the way it has been since the 1960s, and no amount of NoSQL has changed that. However, most self-respecting developers have got rid of mutable global variables in their code long ago. So why do we tolerate databases as they are?

This isn't true. Databases _can_ be those things, but that isn't the definition of a database. Most of the databases I have worked and created do not use update or delete except to archive old data that is no longer in the working set.

And mutability isn't always faster. Most of the time when people are championing mutability, it is because it is the most expedient (esp with their mental model), not because from a whole system standpoint it is actually faster. They trot out a microbenchmark that proves their point while ignoring use cases like retrieving old state or auditing the transaction history.

eloff 4127 days ago

You'll have to go into detail. Mutable data structures mop the floor with immutable ones, synchronization issues aside. What's actually better depends on the specific case in question, but mutable data structures have a superset of the techniques available to immutable ones, and that's never a disadvantage. It, might, however lead you to use a poor design that falls down under contention.

sitkack 4127 days ago

Mutability has to put into context of the whole system, not just a narrowly focused benchmark. We can no longer put synchronization issues aside, that just isn't possible anymore. Correctness (coherency and data races) both on a micro level and macro are more important than raw mutation rates. While there are times when mutation is the answer, it has to thought about carefully.

eloff 4127 days ago

Correctness is very important. But being correct and faster is better. A lot of the techniques I use to accomplish that start to look similar to immutable. Copy on write is the cornerstone of good lock-free algorithms as well as immutable data structures. However, not being religious about mutability has tangible performance benefits - depending on the details, as always.

sitkack 4127 days ago

Yes, but that stuff is hard and error prone.

Immutability is easier (safer, more correct) on the whole. I think both Rust and Clojure take a good stance on mutation.

Only for anecdotal evidence, my ad-hoc immutable (lots of copying) ETLs performed better than the mutation happy ones. The GC was able to throw stuff away faster, the code was cleaner, kept only what it needed. The gaussian circle of confusion was smaller.

eloff 4127 days ago

It has it's advantages to be sure, I really like eventsourcing. For some kinds of projects it's the obvious way to go. Finance seems like a killer application, because of the way the entire audit trail is stored, and the system can often be rebuilt to a valid state upon finding and correcting a production bug.

whatthemick 4127 days ago

For anything that critically needs an audit log it's a pretty obvious choice (atleast compared to when people try to implement audit logs in a RDBMS).

I think it might also have a great deal of merit in prototyping applications quickly - Having the option to do just-in-time/memory projections while later switching to "real" storage seems really ideal to me if i'm just building something to validate it would like to defer those type of tech decisions.

shin_lao 4127 days ago

Agreed.

When designing our product it became quickly apparent that immutability isn't practical, we opted for MVCC ACID transactions, but the most difficult part of MVCC database is getting the purge right.

You can cheat purges with some clever optimization but at some point, you need to clean up old versions and when you do that, you are using precious I/O.

Getting purge/compaction right is hard. Update intensive scenarii are always problematic for databases.

eloff 4127 days ago

As a matter of interest, what product is that?

I used some tricks to reduce the amount of data that must be cleaned up at any given point in time, but it was not possible to evade it completely. I still have to do concurrent compaction, and there's some really nasty corner cases I don't have any good solutions for, it's a hard problem.

boredandroid 4127 days ago

I agree with this comment, but I think this blog post is really addressing data flow at the company-wide or datacenter scale. Surprisingly there immutable event data hasn't really had much of a home at all beyond data warehousing.