| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by socketnaut 2667 days ago
	The statistical claims in the article make the assumption that tweets are being sampled uniformly at random, which is most likely false. The fact that 3 machines handle 20% of tweets suggests that tweets are not in fact assigned to machines in a uniformly random manner. I would guess that there is a geographic bias as to which machines handle which tweets.

1 comments

stuck_in_matrix 2667 days ago

When I did the analysis, I was puzzled why certain machines handle a higher percentage of tweets compared to others -- so you are most likely correct that there may be some geographic consideration to the distribution.

I'm rewriting the code to include a prescan of the time range to determine which server ids are in play at the time and which server ids are most active.

Figuring out how to deconstruct Snowflake was challenging and there is still a lot of analysis left to do.

link

rokob 2667 days ago

> Figuring out how to deconstruct Snowflake was challenging and there is still a lot of analysis left to do.

Why don't you just read the code?

link

detaro 2667 days ago

How do you read the code of an implementation detail of Twitter's servers? There's no guarantee that the example code they released years ago still matches what they use.

link