Hacker News new | ask | show | jobs
by liquidgecka 1282 days ago
I am pretty sure that snowflake didn't use a mod of a signed int32. It used a service discovery pool as part of finagle (and prior to that dns iirc). The server used a very simple method internally to convert time into a integer (that was 52 bits because of javascript). In fact it was completely open source: https://blog.twitter.com/engineering/en_us/a/2010/announcing...

The integer generation was pretty simple, there was a fixed id of each server, and unless I a mistaken we have 5 servers per datacenter. Each id was basically <time><offset><id> where time was a millisecond timer, offset was the number of ids generated in that same millisecond by the same server, and id was the machines unique identifier. When we first talked about this process I thought that offset was going to roll, every id would increment it by one. This was changed to resetting it every millisecond specifically so that it would obscure tweet volumes.

At the time I remember reading a LOT of articles estimating tweet volume and most of them were way, way off. I don't know that we ever really put effort into correcting them though. =)

* - Does not account for changes in the system post 2012.

5 comments

Awesome to hear from someone on the other side of the API with knowledge of this. This is ringing bells for me. Yeah, the id parameter was how we knew how many servers there were, and we saw more assignment in some servers than others that neatly mapped to a int32 max failing to divide by the number of IDs we saw. I thought I recall Twitter confirming that was how round robin happened but I could be totally misremembering. We never got a contact to talk to Twitter about it. FWIW, we did eventually see this fixed. I imagine it was pretty easy to spot one server seeing less load than others.

The offset was actually how we calculated volume, because millisecond collisions become a variant of the german tank problem[1]. A few times when y'all made tweet volumes public it mapped pretty closely with our estimates.

This was around 2011, so your knowledge should be relevant.

1: https://en.wikipedia.org/wiki/German_tank_problem

I think what your describing was actually a problem that I introduced in our bind configuration back when we used DNS for service discovery. Its not exactly what your describing but I can totally see how it would appear that way externally.

So initially we used 2 bind servers with authoritative zones serving using round robin. This worked fairly well as the load on each server was high enough to keep the round robin flowing and the connections from ruby would "stick" for 30 seconds anyway. We had a very large pool of front ends so its not like it mattered. Eventually we had to put a local bind cache on each server, but this introduced another super annoying bug. For some reason, the interaction between the authoritative server, and the caching server would end up causing the served record set to be ordered the same. Normally the authoritative server would serve A, B, C for the first query, then B, C, A for the second, etc. When the cache in the middle for some reason it would reset al queries to A, B, C after the refresh happened. So effectively every 30 seconds it would reset to A, B, C and start round robin again. Since the front ends would connect and stay sticky via connection reuse for a while this meant that A got like 50% of the load, B got 33%, and C got 17%. I am guessing you latched on to this by seeing that one of the servers was horribly underutilized in comparison and reproduced the math accidentally. =)

The fix for this was the massively disgusting, but super effective "make every single machine a DNS zone secondary". Rather than having a simple record cache we just fetched the whole zone if it changed. This actually made DNS quite a bit faster as every machine had a local copy of the zone.. Once that happened distribution returned to normal (20/20/20/20/20) fix for all of our internal services which used DNS for discovery, including snowflake.

This is awesome and completely retconning a 10 year old project for me. I was working on social media analytics at Disney and we were exploring ways to measure twitter conversation of our brands, which led to us attempting to estimate total conversation volume, which is why this technical nuance was relevant to us. It was a wildly experimental time. Thanks for the story!
This thread is amazing.. thank you both for sharing!
Agree, was super excited when I saw reference to the German tank problem as that is immediately what I thought of in the first post.
I dream of the day in the future, possibly in Eden, when I'll have opportunity to discuss the things I've reverse engineered with the guys and gals who actually engineered them. This conversation in enlightening to witness.

I will have an especially hard time phrasing questions in a respectful manner for the Adobe devs, though ))

Also, totally unrelated Twitter story (because I am nostalgic lately and you reminded me of it)

Early in my time there we had a major, unexpected load spike in the middle of the day. People where tweeting like crazy, breaking records for volume, etc. The system was groaning under strain but it mostly held. Turns out Michael Jackson had died. We sustained 452 (or maybe 457, it was roughly 450-something) tweets a second. this quickly became 1 "MJ" worth of tweets. We informally used this to define load the entire time I was there. Within a few months we got to a point where we were never BELOW 1 MJ, within a year I think we had peaked at double digit MJs and sustained several even in the lowest periods. Before I left we had hit 1 MJ in photo tweets, etc.

Around the time I left we did something like 300 MJ's per second, and I was only there 3 years.

I remember those days before snowflake and Blaine was desperately trying to keep the lights on. It's why no one had time to talk to us (marketers and related disciplines) back then. Even Facebook said "all our focus is on acquiring users. You have no choice but to meet us on our terms because we own the eyeballs." Everything was growing so fast. Hard to believe that was only a decade ago.
Musk claimed a record of 20,000 of tweets per second recently [1]. How does that square with what you’re saying? 300 times 450 is closer to 150,000. Am I missing something?

[1]https://twitter.com/elonmusk/status/1595505413113323520?s=20...

That tweet does not claim 20,000 is a record.

The event also did not reach 20,000, it "almost" hit it.

You’re right. I misinterpreted ‘record usage’. I found a 2013 blog post [1] claiming a 143 thousand tweets per second peak.

[1] https://blog.twitter.com/engineering/en_us/a/2013/new-tweets...

That's interesting; I worked on a different site that also got record traffic when MJ died. I wonder if that happened to every site that had a chat or news feature.

Kind of obvious now, but I bet we could detect major world news just by sampling traffic size of chat sites.

Someone previously created a tweet linking to itself by predicting the likely ID range: https://oisinmoran.com/quinetweet
A simpler scheme I've used for adtech (billions of requests per day) is to simply reserve a chunk of numbers for each server from a central source. Easy to implement, very fast since each node can just increment in process, and using a 64-bit integer is effectively infinite.
That's how Active Directory Domain Controllers handle the distributed creation of SIDs too.
Twitter didn't always use Snowflake -- that was introduced in November 2010. There was another, much simpler algorithm used before that which generated much smaller IDs (e.g. under 3e10).
Yea.. "auto increment" in mysql. it SUCKED in so many ways, primary of which was that it required a single mysql database to be the primary for all tweet writes. Every tweet just incremented a counter by 1. As we scaled up that server got overloaded rapidly and became a huge single point of failure. Snowflake was introduced to solve several problems at once. =)
> 52 bits because of javascript

But IEEE 754 doubles have a significand that supports a 53-bit range. What am I missing?

You're not missing anything, people just forget the implied bit sometimes.
negative numbers not used?
That doesn't make any sense.