Hacker News new | ask | show | jobs
by deepu_256 5552 days ago
This is from my experience of trying to build a news feed (we are about to launch one for our site which gets about a million unique visitors a month).

The approach you suggested is the first approach that i tried. And frankly it worked great when i am testing the feature on my dev macbook with just a couple of 1000's of users. But when i started testing it on our server loading entire legacy data redis ended up taking more than 6GB of RAM for abt half a million users.

i told myself to just plan for 2 times our current traffic and then later on think of a better solution(best thing wud be to plan for 5-10 times your current traffic) and told my CEO that we might have to bump up RAM on the redis machine to 32 or 64 GB in the future(the easiest way but not an elegant way to solve the problem is throw more resources). Also mind that you need atleast 2 such machines to provide failover. i just hoped salvatore(a great guy BTW) would release a cluster solution for redis and will save me.

While testing the feed we came to know we needed to add more such actions from the user and every time you add a new type of action for every user you are looking at increasing the memory of Redis process by substancial amount. you will run out of memory faster than you have planned. i thought VM is the solution to all of this as user feeds typically use only latest data and the entire feed doesn't need to be in the memory. But quite frankly VM has it's own set of problems -- http://groups.google.com/group/redis-db/browse_thread/thread....

Not telling that your solution isn't good. Just wanted to warn you about some potential problems from experience of trying out your solution. Finally i am using cassandra for news feed. Working great as of now. At least much better than my experience with implementing the same with redis. The major problem with cassandra as of now is implementing pagination and counting. distributed counters are coming in 0.8 . pagination is still a major problem though. But for news feed you might not need pagination. providing a load more button as facebook and twitter does should be good enough.

Hope it helps.

2 comments

I'm not sure what version of Redis you were using, but depending on the data you're storing, 2.2 uses significantly less space than previous versions.

Some data from GitHub: http://technoweenie-ruby-onales.heroku.com/#34

It looks like our approaches are very different. Stratocaster stores lists of integer IDs, which Redis optimizes for pretty heavily. We're taking the list of IDs and doing a multi-get from either memcache or mysql.

It looks like this lib uses Sets of the actual event data. It won't be able to take advantage of the same optimizations.

On the plus side, switching from storing zsets of actual event data to storing integer ids should be a pretty minor code change.
Thanks for the feedback. I really appreciate it.

I do think space could be an issue. To mitigate against this risk I've decided to only keep ~8 pages of each feed in Redis, and compress the values (including removing attributes that won't be shown in the feed). Also, Redis 2.2 is a lot more space efficient than earlier versions.

Our primary datastore is actually MongoDB, which has some similar (NoSQL) characteristics, but I've found that storing the feeds in Redis is actually cheaper than storing the same data in Mongo. That said, I really like Mongo for its richer querying (and greater persistence guarantees), and use it for canonical storage of users, posts, etc.