|
|
|
|
|
by socketnaut
2667 days ago
|
|
The statistical claims in the article make the assumption that tweets are being sampled uniformly at random, which is most likely false. The fact that 3 machines handle 20% of tweets suggests that tweets are not in fact assigned to machines in a uniformly random manner. I would guess that there is a geographic bias as to which machines handle which tweets. |
|
I'm rewriting the code to include a prescan of the time range to determine which server ids are in play at the time and which server ids are most active.
Figuring out how to deconstruct Snowflake was challenging and there is still a lot of analysis left to do.