Hacker News new | ask | show | jobs
by evanelias 3557 days ago
The upscaledb FAQ indicates "it is not yet concurrent; it uses a big lock to make sure that only one thread can access the upscaledb environment at a time".

InnoDB is designed for concurrency (using MVCC, granular locking, etc) so I'd expect it to be slower at single-threaded workloads than another engine that skips all that.

Only using single-threaded benchmarking is a bit misleading, imo. This is mentioned in the article but only in a small bullet point towards the bottom.

2 comments

I ran sysbench benchmarks with 30 concurrent connections. The performance gap between InnoDB and upscaledb shrank a bit, but not much.

The reason is that most of the performance is spent in MySQL and not in the key/value store, and then it does not make a big difference if the key/value store is concurrent or not.

In my experience the assumption of "concurrent = fast" is a misconception. Right now upscaledb moves certain operations (i.e. flushing dirty buffers) to the background. It is better to have fast single-threaded code instead of multi-threaded code with a huge locking overhead. A compromise would be to move the lock to the database level (instead of the Environment, which is basically the container for multiple databases), and make sure that there's no shared state between the databases. But that actually does not have that much priority for me because I do not expect to win that much performance.

30 concurrent connections is a very low number, compared to uses of MySQL/InnoDB at real scale. What happens at higher numbers?

I am not arguing that "concurrent = fast". My point is real systems have a higher level of concurrency as a baseline.

InnoDB supports granular concurrent access because real workloads need this. Systems that have poor stories around concurrency -- MyISAM, Redis, pre-WiredTiger MongoDB -- definitely hit real scalability issues under high-volume workloads.

There's literally a dozen ways to structure/ design a storage engine, every one of them has their pros and con's.

MVCC is so popular because it makes ACID compliance much easier to implement, but there is some additional read latency (and management overhead) because storage layout is disconnected from the natural layout.

A storage engine which sticks to the basics could be very fast, with predictable low-latency.

With regards to threading. I know locks are a dirty word, but you do need a single version of truth somewhere. Either this is done through locking or an allocator. Going single-threaded is a valid way to remove the overhead of locking (plus no race conditions!).

Hopefully a single writer, multiple readers is available soon.