|
|
|
|
|
by maxdemarzi
5503 days ago
|
|
" To compute reach, you need to get all the people who tweeted the URL, get all the followers of all those people, unique that set of followers, and then count the number of uniques. It's an intense computation that potentially involves thousands of database calls and tens of millions of follower records." Or you could use a Graph DB to solve a Graph problem. URL -> tweeted_by -> users -> followed_by -> users Try that on Neo4j. |
|
The reach computation on Storm does everything in parallel (across however many machines you need to scale the computation) and gets data using distributed key/value databases (Riak, in our case).