| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rdtsc 4607 days ago
	Howard, is LMDB effectively limited to 128T (on 64bit machines and 2GB on 32bit ones, not that one should be running large databases on 32bit machines)? Also what about concurrent writes? Does it have a database wide writer lock or is it per key (per page?) ?

1 comments

hyc_symas 4607 days ago

It is limited to the logical address space. Since most current x86-64 machines have only 48bit address space, 256TB, and assuming the kernel keeps half of the space for itself, then yes, the current limit is 128TB. But I suspect we'll be seeing 56bit address spaces fairly soon.

It is a single-writer DB, one DB-wide writer lock. Fine-grained locking is a tar pit.

link

leif 4607 days ago

Fine-grained locking is hard, but "tar pit" is unfair and honestly a bad attitude. It's crucial for modern applications, and it can be done if you're careful, and it can be done really well.

We (Tokutek) tried for a long time to get by with a big monolithic lock, and a) competing with InnoDB was really hard since they do concurrent writers really really well, and b) when we did decide to break up the lock, it wasn't as hard as we thought it would be and it worked really really well.

Don't get discouraged, break up that lock!

link

hyc_symas 4607 days ago

In our own workloads, writers are always going after the same pages in their index updates, which inevitably led to deadlocks in BerkeleyDB. As a result, we get much higher throughput with fully serialized writers than with "concurrent" writers. A microbench might show greater concurrency on simple write tasks, but in a real live system with elaborate schema, there's no payoff for us.

As always, you have to profile your workload and see where the delays and bottlenecks really are. Taking a single mutex instead of continuously locking/unlocking all over the place was a win for us.

link

rossjudson 4606 days ago

Is this the reason for your observation that LMDB is oriented towards read workloads?

I can see how the extra code locking/concurrency code would expand the library size out of the CPU cache, though.

link

hyc_symas 4606 days ago

Yes, since readers don't require any locks at all and don't issue any blocking calls of any kind - syscalls, malloc, whatever - they run completely unimpeded. The moment you introduce fine-grained locks of any kind the overall performance (reads and writes) will decrease by at least an order of magnitude because readers will have to deal with lock contention.

link

rdtsc 4607 days ago

Makes sense.

Most impressive about LMDB to me is the zero-copy model for readers, with is no extra memcpy needed, maybe that is something obvious for database gurus but it is pretty clever trick I think.

link

hyc_symas 4607 days ago

It's pretty significant, yes. Eliminating multiple copies of everything got us a 4:1 reduction in memory footprint in OpenLDAP slapd (compared to our BerkeleyDB-based backend). This is another reason we don't spend too much time worrying about data compression and I/O bound workloads - when you've essentially expanded your available space by a factor of 4, you get the same benefits of compression, without wasting any of the memory or CPU time. And when you can fit a 4x larger working set into your space, you find that you need a lot less actual I/Os.

link

kitsune_ 4607 days ago

If I can pluck your brain for a little, do you think LMDB would be a good option as a back end for time series analysis?

link

hyc_symas 4606 days ago

I'm sorry, I'm not familiar enough with the workload to answer that. If you're primarily doing sequential writes, it seems like it could work well for it.

link