| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stuck_in_matrix 2667 days ago
	I am the author of this document. If anyone has any questions, I'd be happy to answer them!

2 comments

doomjunky 2666 days ago

Assuming that sequential IDs are handed out sequentially. Assuming further that gaps in this sequence indicating deleted tweets.

You could reduce the sequence ID space by employing the fact that it is overall less likely that a tweet is deleted than otherwise and a sequence of two CONSECUTIVE deleted tweets is even less likelier.

E.g. If you find sequence ID 0 but don't find 1,2,3 than you can probably skip 4,5,6,7. I don't know the probabilities of deleted tweets, but i am sure someone could calculate it to determine the 99% threshold.

link

the_arun 2667 days ago

The timestamp is generated per server. The system time could differ across nodes in the cluster (even with NTP) by nano seconds. So, isn't the accuracy of the first tweet is approximation if there are multiple tweets containing same word "earthquake" at the same time at ns level? But I get the point.

link

talaketu 2667 days ago

Tweet object from the Twitter API has a "creation_time" that has 1s resolution, whereas the snowflake creation time has 1ms resolution. No doubt these could disagree, but if that happened then maybe both authors could get a prize?

link