| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by leif 4408 days ago

Yes, there are still problems with the election protocol, e.g. [1]. The right kind of network partitions can cause multiple primaries to stay up indefinitely, accepting writes on both sides of the partition, which will eventually be rolled back. There is another problem with the election protocol that allows writes acknowledged by a majority of machines to be rolled back after an election.

Both of these problems can be fixed by using something like Raft[2] or Paxos for elections, rather than the ad hoc mechanisms used today.

In TokuMX[3], we're currently working on replacing the election algorithm with something similar to Raft, that will eliminate these sources of data loss. We've heard that MongoDB is also working on fixing replication, but we don't know what their exact plans are (they have a bigger challenge since they need to stay compatible with their existing replication algorithms, which use timestamps as transaction identifiers) or whether these fixes will end up in 2.8 or in a later version.

[1]: https://jira.mongodb.org/browse/SERVER-9848

[2]: https://ramcloud.stanford.edu/wiki/download/attachments/1137...

[3]: http://docs.tokutek.com/tokumx

1 comments

ericingram 4408 days ago

Once again, a TokuMX engineer steps up to explain issue and offer a potential solution. I can't help but wonder why MongoDB engineers aren't doing this. But no matter, just glad we're using TokuMX.

link

nevi-me 4408 days ago

I think if I was working from someone's codebase, with about 80-90% of what I need already built-in, I would have the time and resources to make improvements on my fork, and shine glorious over the software making the base of my product.

I wanted to try TokuMX months ago, but when I learnt that the version at the time was based on Mongo 2.2 I shied away from it, because I need GeoJSON capabilities. I remember that with 2.6 one of TokuTek's engineers said that they needed to look at Mongo's code and start playing catch-up, I don't know if they've done that so far.

What will Mongo 2.8 mean for TokuMX? We're seeing document-level locking, possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior), possible transactions (although what's on JIRA hasn't convinced me so far) and a few other improvements and Performance Boosting Things. So to what scale with Toku remain relevant if they don't keep up to date with Mongo, because in my case, using their versions based on 2.2, their ideology of being 'a drop-in replacement for MongoDB' doesn't work.

I'll go to their Github page and try see whether they've merged the 2.6 codebase to their latest versions though :) EDIT: from looking at their release changelogs, as of October last year, they were in parity with Mongo 2.4, with the exception of geo-indices and full-text search, and 2.6 is still an open milestone.

It kind of feels like the Joyent vs Strongloop thing on Node.js, but I wonder if TokuTek employees push bug-fixes upstream to Mongo, or whether they just fix them on TokuMX and use that as a selling-point; again with this I'll have to do some digging to inform my opinion, but I'd appreciate if someone who knows could clarify it.

link

leif 4407 days ago

I can't add much to what Chris and Zardosht already said, but let me reiterate a few things regarding our fork:

1. You're a bit out of date. We merged changes to catch up to 2.4 in about a month (once we decided 2.4.x was stable). The current plan is the same for 2.6. We're currently working on it. If you need the latest and greatest Mongo features, stick with basic MongoDB. If you're willing to suffer a bit of lag (on the order of months) to receive our benefits, we're here if we can help.

2. Geo is a known issue. At the moment it doesn't seem like it's that widely used, so it's not a very high priority. However, we know some people want it and we will eventually get to it. Hopefully with a better implementation.

3. MongoDB's full-text search capabilities are, as far as I can tell, far behind what's provided by the state of the art text search systems, and serious users currently use MongoDB/TokuMX in concert with more focused solutions like Solr/Lucene/Elastic Search. I haven't spoken to anyone invested in text search that actually used MongoDB's text indexes, even if they use MongoDB elsewhere in their application. If you do, I'd love to buy you lunch and talk about it, please email me (my username here at tokutek.com).

4. Here's the big takeaway I got from last week's conference: MongoDB has been convinced that many of the problems we solve with TokuMX (performance, compression, concurrency, transactions) are important to their biggest users. Their most hyped announcements and plans for 2.8---document-level locking and the storage engine API---are aimed straight at us. We see this as a resounding validation of our technology, and a wonderful challenge to continue improving TokuMX. While it's tantalizing to implement a fractal tree storage engine according to their API (and there's no doubt that we will implement one), our innovations in TokuMX proper run deeper, into extra collection types, replication and sharding internals, and we have further plans for TokuMX that are beyond the scope of a storage engine API. The availability of the API is an opportunity for us to create a product with some of our improvements (mainly insertion performance and compression) with better compatibility (esp. w.r.t. replication and geo/full-text) and a simpler upgrade path. However, TokuMX as it exists as its own product (with better replication, sharding, and advanced features like clustering indexes and partitioned collections) is not going away, and will continue to see aggressive innovation as it will always lead a product built from MongoDB's storage engine API in terms of advanced features like clustering indexes and shard-aware transactions.

link

leif 4407 days ago

My other reply didn't address some of your specific questions about 2.8:

> We're seeing document-level locking

We've had it from the beginning. Their implementation so far doesn't handle index updates or replication. I assume they'll handle these issues before a GA release, but the interesting question is which workloads will still demonstrate good concurrency after they solve these problems.

> possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior)

I haven't seen any actual improvements they've got planned. Besides, B-trees will never compete with our fractal trees on insertions or compression, or for that matter, with LSM trees either.

> possible transactions (although what's on JIRA hasn't convinced me so far)

They aren't going to do transactions in 2.8. They may provide something like some transactional semantics we provide in TokuMX after 2.8 (I've heard mentions of single-shard atomicity), but by this point we have even bigger and better things than just single-shard transactions planned.

> and a few other improvements and Performance Boosting Things.

Not sure what you mean. The coolest things I've heard are not storage related, e.g. filtered replication. They're definitely exciting, but unrelated enough that we should be able to just merge them wholesale.

> So to what scale with Toku remain relevant if they don't keep up to date with Mongo

We'll keep up, don't worry. Here's hoping we maintain---and gain---relevance. ;-)

link

zardosht 4408 days ago

Another engineer at Tokutek here. As you see, we are up to 2.4, and have been investigating 2.6 and Geo. With all possible features, whether they be from MongoDB 2.6 or things we innovate on our own like partitioned collections, we prioritize and address them based on customer and user feedback.

Also, 2.6 is not an all or nothing proposition that needs to be done in one release. Features with the most demand (whether it be the new write commands or aggregation framework improvements) will be done before others. We've done this before. When we released 1.0 that was based on 2.2, we also released hash based sharding with it which was a 2.4 feature. We did so because users demanded it.

As for pushing bug fixes upstream, we file bugs when we see them. Our VP of engineering was a winner in the MongoDB 2.6 bug hunt with SERVER-12878. SERVER-9848 and SERVER-14382 are among the bugs I've filed.

link

nevi-me 4408 days ago

Thanks for the response, I read a post on the mongo-user group , and that's what I noticed, that a number of features are ported as and when necessary. Don't read what I say in a very negative sense, because I'm mostly curious, and it's my opinion that sometimes the little that we (I) get exposed to regarding TokuMX specifically is that it's superior to Mongo, that it's a "choose us or lose out" thing, but that happens when one doesn't follow a certain topic, but only sees it being mentioned here and there (understandable since Mongo has been the subject of "my start-up failed, and I blame it on Mongo; so burn Mongo" kind of discussions).

One more question if you don't mind: since MongoDB will support various storage engines from 2.8, including Tokutek's storage engine (can't remember its name); notwithstanding other innovations on TokuMX, would switching from mmap to Tokutek's storage engine mean that one ends up with Mongo having geo-indices and other bells, while having TokuMX's main feature?

link

zardosht 4408 days ago

Your last question is a bit loaded with a bunch of "ifs", so let's unwind it. I don't know what MongoDB will "support" as far as other engines go. But assuming we, Tokutek, release something that we support that is our engine plugged into 2.8 using MongoDB's storage engine plugin, then according to the design we heard about at MongoDBWorld, that product will be what you think it is: Mongo with geo and "other bells", and TokuMX's compression + write performance.

But 2.8 is a bit away and the storage engine API is a very fresh development. I don't think anyone is in a position to be able to really guarantee what it would look like and how TokuFT (https://github.com/Tokutek/ft-index/) will plug into it. I definitely cannot make any promises.

If you are interested in TokuMX + some missing features from MongoDB (sounds like geo), and don't mind discussing your needs and use cases with our sales guys, please give us feedback at http://www.tokutek.com/contact/. As I mentioned previously, user feedback drives what we do, so at the very least, you can provide some additional data points.

link

ericingram 4408 days ago

We didn't need GEO indexing but what Toku does offer is pretty exciting. Primary wins for us include multi-query transactions, compression, fractal tree indexes (thus overall insert and query performance), and clustering indexes.

link

cheald 4407 days ago

> What will Mongo 2.8 mean for TokuMX? We're seeing document-level locking, possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior), possible transactions (although what's on JIRA hasn't convinced me so far) and a few other improvements and Performance Boosting Things.

TokuMX is quite a bit ahead of MongoDB in those respects - it already has document-level locking, transactions, MVCC guarantees, and partitioned collections (a recent addition, and awesome for time series data). It also offers (configurable) data compression and a non-fragmenting storage mechanism, which means tremendous disk usage savings in update-heavy workloads. It's quite a lot faster both reading and writing in many cases (particularly when you can take advantage of clustering keys). It's not without its faults, but the things it has over vanilla MongoDB far outweigh its faults, IMO.

It would be nice to have geo-indices and full-text search (we use ElasticSearch for the latter in concert with TokuMX, and it supports geoindices too), but "server doesn't fall over under high load" is a lot more important to us (as users, I don't work for Tokutek). 2.8 promises to bring MongoDB up to speed in a lot of respects, but it's definitely not accurate to think of TokuMX as "MongoDB 2.4 except with faster writes".

link