|
|
|
|
|
by charrington
5609 days ago
|
|
I think that at the rates Twitter is writing counter data (Many TBs per day denormalized, ~0.5TB normalized), a RAM-based solution like VoltDB would be prohibitively expensive. Rainbird allows Twitter to use cheap disk-based storage but still get acceptable (sub-second) latency. |
|
- VoltDB isn't log-structured, so you really only have to store the state. How fast you can mutate it isn't limited by RAM amounts. We see use cases with utter firehoses of data that update just tens or hundreds of gigabytes of state.
- Beyond normalization, you can probably reduce the number of redundant counters, e.g. use SQL to count which URLs start with "amazon". This would be painful in many systems, but depending on the query, can often be done at scale in VoltDB.
- The byte overhead per counter is also likely much lower in an ACID/Relational store.
Finally, VoltDB is designed to migrate data to disk based stores (such as Hadoop or an OLAP store) as memory fills up. This is a feature we're working very hard on see as a big differentiator. It adds complexity if you need to query across stores, but you get a best-of-both-worlds feature set.