Assuming that sequential IDs are handed out sequentially. Assuming further that gaps in this sequence indicating deleted tweets.
You could reduce the sequence ID space by employing the fact that it is overall less likely that a tweet is deleted than otherwise and a sequence of two CONSECUTIVE deleted tweets is even less likelier.
E.g. If you find sequence ID 0 but don't find 1,2,3 than you can probably skip 4,5,6,7. I don't know the probabilities of deleted tweets, but i am sure someone could calculate it to determine the 99% threshold.
The timestamp is generated per server. The system time could differ across nodes in the cluster (even with NTP) by nano seconds. So, isn't the accuracy of the first tweet is approximation if there are multiple tweets containing same word "earthquake" at the same time at ns level? But I get the point.
Tweet object from the Twitter API has a "creation_time" that has 1s resolution, whereas the snowflake creation time has 1ms resolution. No doubt these could disagree, but if that happened then maybe both authors could get a prize?
You could reduce the sequence ID space by employing the fact that it is overall less likely that a tweet is deleted than otherwise and a sequence of two CONSECUTIVE deleted tweets is even less likelier.
E.g. If you find sequence ID 0 but don't find 1,2,3 than you can probably skip 4,5,6,7. I don't know the probabilities of deleted tweets, but i am sure someone could calculate it to determine the 99% threshold.