I’m very excited to hear about the new storage engine and Apple’s record layer! Additionally, the lightning talk about backwards compatibility for rolling upgrades would be a great addition. Those three would make FoundationDB a much more obvious fit for the average application.
My talk is at 10:40! If anyone is attending and would be interested in meeting up, my email is in my profile and my Twitter handle is the same as my HN username.
Apple internally uses Cassandra, HBase, Riak, Hadoop/Impala, Oracle, Siri Search kv store, memcache, redis, MySQL, Postgres etc etc. Each of these handle >100 TB in aggregate.
Considering these applications won’t be ported to FDB, why not develop a translation later. This will also drive adoption of FDB.
Having smart people work on cool things is not sufficient, you also need them to be working on solving high impact but boring problems.
>Considering these applications won’t be ported to FDB, why not develop a translation later. This will also drive adoption of FDB.
Writing a translation layer would be nice, "drop in" replacement can be an overkill to drive growth.
That said the three biggest factors for adoption in my opinion are developer experience, tooling and hosting.
If some FoundationDB enthusiasts made an elastic hosting service and some dedicated tooling it would be help massively to compete with others NoSQL Vendors.
I've been having enormous success using FDB for my POC. It's ability to do atomic mutations is honestly game changing for our use case. Mandatory transactions are also a lifesaver, as our previous implementation required careful OCC.
Real-time aggregations over a stream of data where we may have multiple servers writing a partial aggregation to the same row. With FDB I can safely read data, merge it with my in-memory copy, and then write the final result back. That's only for our complex aggregations, such as HyperLogLog and T-Digests. For the easier things like COUNT I can just use the ADD mutation. For SUM of doubles, I can use APPEND_IF_FITS to keep each partial aggregation as a "running log" of partial sums in a single row.
That's pretty cool! You implemented HLL and TDigest on top of FDB? Or are you storing some kind of blob and computing server-side? I did something similar for Hadoop and MonetDB a long time ago
Both the HLL (Algebird) and TDigest implementations we're using have a simple way to serialize a compressed representation. So basically just reading the row, merging the value currently stored, and writing the merged value back.
Depending on how many times you will write to the row, you could avoid having to do a merge on write by using APPEND_IF_FITS and just merging the byte arrays when you read.
It's nice that FDB gives you so much low level flexibility, you can do whatever you feel fits your use case.
Hey man, thats pretty cool and we do exactly the same using Cassandra instead of FDB. Since Cassandra doesnt support transaction at high volume (100K tps) we do a shuffle so that all the same key do read/modify/write from the same machine. It seems like with FDB you can get away with it as it supports transactions? My question to you is what is the volume your system is operating at? Also how does it work for skews? Lets say you need to update HLL for a key that is heavily skewed, does your FDB transaction unwind fast enough not to slow down the whole system?
The program includes talks about other layers that Apple is developing. Does anyone know if they are planning on open sourcing any of those layers in the future?
From the description, the "record layer" talk seems like it's about an example POC and not a real project:
> This talk will provide a developer’s perspective building a new FoundationDB layer by describing the design and development of a record store that can provide semantics similar to a relational database. This example layer will provide the core functionality of a structured data store such as metadata management, indexing, and query planning.
FoundationDB is a key-value store which supports transactions and scans over key ranges. Then on top of that layer, you can add higher level abstractions, like relational, document, graph databases. So it's similar to the storage engine concept many database servers use.
My talk is at 10:40! If anyone is attending and would be interested in meeting up, my email is in my profile and my Twitter handle is the same as my HN username.