|
I think if I was working from someone's codebase, with about 80-90% of what I need already built-in, I would have the time and resources to make improvements on my fork, and shine glorious over the software making the base of my product. I wanted to try TokuMX months ago, but when I learnt that the version at the time was based on Mongo 2.2 I shied away from it, because I need GeoJSON capabilities. I remember that with 2.6 one of TokuTek's engineers said that they needed to look at Mongo's code and start playing catch-up, I don't know if they've done that so far. What will Mongo 2.8 mean for TokuMX? We're seeing document-level locking, possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior), possible transactions (although what's on JIRA hasn't convinced me so far) and a few other improvements and Performance Boosting Things. So to what scale with Toku remain relevant if they don't keep up to date with Mongo, because in my case, using their versions based on 2.2, their ideology of being 'a drop-in replacement for MongoDB' doesn't work. I'll go to their Github page and try see whether they've merged the 2.6 codebase to their latest versions though :)
EDIT: from looking at their release changelogs, as of October last year, they were in parity with Mongo 2.4, with the exception of geo-indices and full-text search, and 2.6 is still an open milestone. It kind of feels like the Joyent vs Strongloop thing on Node.js, but I wonder if TokuTek employees push bug-fixes upstream to Mongo, or whether they just fix them on TokuMX and use that as a selling-point; again with this I'll have to do some digging to inform my opinion, but I'd appreciate if someone who knows could clarify it. |
1. You're a bit out of date. We merged changes to catch up to 2.4 in about a month (once we decided 2.4.x was stable). The current plan is the same for 2.6. We're currently working on it. If you need the latest and greatest Mongo features, stick with basic MongoDB. If you're willing to suffer a bit of lag (on the order of months) to receive our benefits, we're here if we can help.
2. Geo is a known issue. At the moment it doesn't seem like it's that widely used, so it's not a very high priority. However, we know some people want it and we will eventually get to it. Hopefully with a better implementation.
3. MongoDB's full-text search capabilities are, as far as I can tell, far behind what's provided by the state of the art text search systems, and serious users currently use MongoDB/TokuMX in concert with more focused solutions like Solr/Lucene/Elastic Search. I haven't spoken to anyone invested in text search that actually used MongoDB's text indexes, even if they use MongoDB elsewhere in their application. If you do, I'd love to buy you lunch and talk about it, please email me (my username here at tokutek.com).
4. Here's the big takeaway I got from last week's conference: MongoDB has been convinced that many of the problems we solve with TokuMX (performance, compression, concurrency, transactions) are important to their biggest users. Their most hyped announcements and plans for 2.8---document-level locking and the storage engine API---are aimed straight at us. We see this as a resounding validation of our technology, and a wonderful challenge to continue improving TokuMX. While it's tantalizing to implement a fractal tree storage engine according to their API (and there's no doubt that we will implement one), our innovations in TokuMX proper run deeper, into extra collection types, replication and sharding internals, and we have further plans for TokuMX that are beyond the scope of a storage engine API. The availability of the API is an opportunity for us to create a product with some of our improvements (mainly insertion performance and compression) with better compatibility (esp. w.r.t. replication and geo/full-text) and a simpler upgrade path. However, TokuMX as it exists as its own product (with better replication, sharding, and advanced features like clustering indexes and partitioned collections) is not going away, and will continue to see aggressive innovation as it will always lead a product built from MongoDB's storage engine API in terms of advanced features like clustering indexes and shard-aware transactions.