Hacker News new | ask | show | jobs
by socketnaut 2667 days ago
The statistical claims in the article make the assumption that tweets are being sampled uniformly at random, which is most likely false.

The fact that 3 machines handle 20% of tweets suggests that tweets are not in fact assigned to machines in a uniformly random manner. I would guess that there is a geographic bias as to which machines handle which tweets.

1 comments

When I did the analysis, I was puzzled why certain machines handle a higher percentage of tweets compared to others -- so you are most likely correct that there may be some geographic consideration to the distribution.

I'm rewriting the code to include a prescan of the time range to determine which server ids are in play at the time and which server ids are most active.

Figuring out how to deconstruct Snowflake was challenging and there is still a lot of analysis left to do.

> Figuring out how to deconstruct Snowflake was challenging and there is still a lot of analysis left to do.

Why don't you just read the code?

How do you read the code of an implementation detail of Twitter's servers? There's no guarantee that the example code they released years ago still matches what they use.