|
|
|
|
|
by liquidgecka
1282 days ago
|
|
I am pretty sure that snowflake didn't use a mod of a signed int32. It used a service discovery pool as part of finagle (and prior to that dns iirc). The server used a very simple method internally to convert time into a integer (that was 52 bits because of javascript). In fact it was completely open source: https://blog.twitter.com/engineering/en_us/a/2010/announcing... The integer generation was pretty simple, there was a fixed id of each server, and unless I a mistaken we have 5 servers per datacenter. Each id was basically <time><offset><id> where time was a millisecond timer, offset was the number of ids generated in that same millisecond by the same server, and id was the machines unique identifier. When we first talked about this process I thought that offset was going to roll, every id would increment it by one. This was changed to resetting it every millisecond specifically so that it would obscure tweet volumes. At the time I remember reading a LOT of articles estimating tweet volume and most of them were way, way off. I don't know that we ever really put effort into correcting them though. =) * - Does not account for changes in the system post 2012. |
|
The offset was actually how we calculated volume, because millisecond collisions become a variant of the german tank problem[1]. A few times when y'all made tweet volumes public it mapped pretty closely with our estimates.
This was around 2011, so your knowledge should be relevant.
1: https://en.wikipedia.org/wiki/German_tank_problem