|
|
|
|
|
by BWStearns
4281 days ago
|
|
36-100MB/person per day ~250 days/year expecting ~20,000 (an educated stupid wild ass guess) initially when the system is actually put into production. ~100-400TB per year(?). Most of the data would only be of interest for a month or so, but we do want to preserve the data in general in some usable fashion for testing and some research stuff. |
|
Cassandra has a nice and simple architecture (every node is identical, no zookeeper roles etc), high write performance and scalability [1], and is fairly robust. My main piece of advice is to get the tables correctly set up. You need to know exactly what queries you want to make and design a table around that query (Cassandra only allows performant queries to be made, unless you go out of your way to set a flag). Whether a query is possible or performant depends on the key of the rows for the table, which may be a composite key. Take a look at the cassandra documentation for more details.
1. http://techblog.netflix.com/2011/11/benchmarking-cassandra-s...