Hacker News new | ask | show | jobs
by Tehnix 805 days ago
I’m definitely of the opinion that what Figma[0] (and earlier, Notion[1]) did is what I’d call “actual engineering”.

Both of these companies are very specific about their circumstances and requirements

- Time is ticking, and downtime is guaranteed if they don’t do anything

- They are not interested in giving up the massive amount feature AWS supports via RDS, very specially around data recovery (anyone involved with Business Continuity planning would understand the importance of this)

- They need to be able to do it safely, incrementally, and without closing the door on reverting their implementation/rollout

- The impact on Developers should be minimal

“Engineering” at its core is about navigating the solution space given your requirements, and they did it well!

Both Figma and Notion meticulously focused on the minimal feature set they needed, in the most important order, to prevent disastrous downtime (e.g Figma didn’t need to support all of SQL for sharding, just their most used subset).

Both companies call out (rightfully so) that they have extensive experience operating RDS at this point, and that existing solutions either didn’t give them the control they needed (Notion) or required a significant rewrite or their data structures (Figma), which was not acceptable.

I think many people also completely underestimate how important operational experience with your solution is at this scale. Switch to Citus/Vitess? You’ll now find out the hard way all the “gotchas” of running those solutions at scale, and it would guarantedly have resulted in significant downtime as they procured this knowledge.

They’d also have to spend a ton of time catching up to RDS features they were suddenly lacking, which I would wager would take much more time than the time it took implementing their solutions.

Great job to both teams!

[0] https://www.figma.com/blog/how-figmas-databases-team-lived-t...

[1] https://www.notion.so/blog/sharding-postgres-at-notion

1 comments

the right way to look at it - IMHO - is to interpret "lots of RDS experience" as complete lack of run-your-own postgreSQL experience. and given that it's not surprising that their cost-benefit math give them the answer of "invest into a custom middleware, instead of moving to running our postgreSQL plus some sharding thing on top"

of course it's not cheap, but probably they are deep into the AWS money pit anyway (so running Citus/whatever would be similar TCO)

and it's okay, AWS is expensive for a lot of bootstrapped startups with huge infra requirements for each additional user, but Figma and Notion are virtually on the exact opposite of that spectrum

also it shows that there's no trivial solution in this space, sharding SQL DBs is very hard in general, and the extant solutions have sharp edges