Hacker News new | ask | show | jobs
by stephenjudkins 6121 days ago
When a Memcached instance goes under, all Diggs stored on that instance in the last N days disappear. Not the end of the world for this application, but very undesirable for most things. You could potentially "fall back" to MySQL for this but a workable strategy for that probably isn't simple.

Further, step two is irrelevant. There's no way of knowing if a (article_id, friend_id) pair in memcached is from the most recent N days or whether it's been stuffed in due to a user being active. Therefore, searching the DB for older diggs is still necessary, and should take the exact same amount of load as if they weren't in memcached at all.

Memcached + MySQL makes great sense when the data set is small and simple. If all the content on the site fits within 1 GB you could probably easily push a hundred millions unique visitors a day. For an application like this, the relatively poor performance of MySQL and the inflexibility of Memcached cause problems.

It seems to me that using Cassandra, even in its current immature state, makes much more sense than the solution you're proposing.

1 comments

Ahh, you make a great point, that if an instance goes down you lose those keys. Cassandra doesn't necessarily solve this, however, unless you have enough failover that it is unthinkable that you could have cassandra lose data. If you can rebuild the cache in cassandra you can rebuild it in the identical way using my scheme, but my scheme only requires N days of data to be run through.

Step 2 is fine, memcached allows setting of timeouts on keys. If you always set a timeout of (dig_time + five_days) - now() you are set. Be careful, times > 30 days imply a unix timestamp not a timeout.

Mysql kindof sucks. The clustering is easy to set up at least.

Cassandra and other big on disk hash tables are pretty cool. I think once they have datastore-like indexing capabilities they will be totally usable. My qualm with using them is that there are 80 of them right now, and they are all pretty immature. The ones that are mature, like bsddb, and complicated to use.

My point was that cassandra is just a big memcached.

> unless you have enough failover that it is unthinkable that you could have cassandra lose data

That's pretty much the idea. Cassandra makes replication + failover totally seamless, so there's no excuses in that respect. :) Cassandra also supports replicating across multiple data centers.