| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adastral 868 days ago
	I see they don't mention Citus (https://github.com/citusdata/citus), which is already a fairly mature native Postgres extension. From the details given in the article, it sounds like they just reimplemented it. I wonder if they were unaware of it or disregarded it for a reason —I currently am in a similar situation as the one described in the blog, trying to shard a massive Postgres DB.

7 comments

gshulegaard 868 days ago

I have worked on teams that have both sharded and partitioned PostgreSQL ourselves (somewhat like Figma) (Postgres 9.4-ish time frame) as well as those that have utilized Citus. I am a strong proponent of Citus and point colleagues in that direction frequently, but depending on how long ago Figma was considering this path I will say that there were some very interesting limitations to Citus not that long ago.

For example, it was only 2 years ago that Citus allowed the joining of data in "local" tables and data retrieved from distributed tables (https://www.citusdata.com/updates/v11-0). In this major update as well, Citus enabled _any_ node to handle queries, previously all queries (whether or not it was modifying data) had to go through the "coordinator" node in your cluster. This could turn into a pretty significant bottleneck which had ramifications for your cluster administration and choices made about how to shape your data (what goes into local tables, reference tables, or distributed tables).

Again, huge fan of Citus, but it's not a magic bullet that makes it so you no longer have to think about scale when using Postgres. It makes it _much_ easier and adds some killer features that push complexity down the stack such that it is _almost_ completely abstracted from application logic. But you still have be cognizant of it, sometimes even altering your data model to accommodate.

gen220 868 days ago

You also benefit from the tailwind of the CitusData team making continued improvement to the extension, whereas an in-house system depends on your company's ability to hire and retain people to maintain + improve the in-house system.

It's hard to account for the value of benefits that have yet to accrue, but this kind of analysis, even if you pretty heavily-discount that future value, tilts the ROI in favor of solutions like Citus, IMO. Especially if your time horizon is 5+ or 10+ years out.

Like you said, if they made this decision 3ish years ago, you would have had to be pretty trusting on that future value. A choice, made today, hinges less on that variable.

wayne-li2 868 days ago

Huh, I would have thought the opposite. Companies at Figma size are easily able to hire talent to maintain a core part of their engineering stack. On the other hand, they retain no control of Citus decision making. Those tailwinds could easily have been headwinds if they went in a direction that did not suit Figma.

gen220 867 days ago

I think this is true for things higher up the "stack", but doesn't necessarily apply to tech like Postgres [and Citus, IMO].

The line separating "build in-house" vs "use OSS" exists, and it's at a different layer of the stack in every company. IMO, for most companies in 2024, the line puts Citus on the same side as Postgres.

FWIW, I would have assumed that Citus would be on the other end of the line, until I had to look into Citus for work for a similar reason that Figma did. You can pick and choose among the orthogonal ideas they implement that most cleanly apply to the present stage of your business, and I would've chosen to build things the same way they did (TBH, Figma's choices superficially appear to be 1:1 to Citus's choices).

sgarland 868 days ago

I thought of that as well. The only thing I could think of is that they mentioned that they don't want to move off of RDS, and there is 0% chance of Citus coming to AWS since Microsoft bought them.

junto 868 days ago

Before clicking on the article I assumed it was Citus, and was surprised when it wasn’t.

Maybe because CitusData was bought by Microsoft around the same time, so Microsoft could create “Azure Cosmos DB for Postgres Cluster”, yet another one of Microsoft’s typical product naming crapshoots.

victor106 868 days ago

> yet another one of Microsoft’s typical product naming crapshoots.

Well said. I haven't seen any company as terrible as Microsoft at naming things. Anyone know why?

salynchnew 868 days ago

Naming things is hard.

At a previous employer, I saw several cool-ish open source projects instantly doomed to obscurity by picking a name that either completely duplicated the name of an existing OSS project or were guaranteed to have terrible SEO for another reason.

However, Microsoft seems to have a unique crossover of fragmented business units and centralized marketing. That's why you end up with Azure -> Subproject -> Actual Product/Service word soup. Perviously, they did this with the Windows Live brand from 2005-2012, and "Xbox" for a wide range of gaming projects (many of which were on PC).

thomasjudge 868 days ago

related, Microsoft on Microsoft marketing:

https://www.youtube.com/watch?v=EUXnJraKM3k

r-bar 868 days ago

The committee wanted Cosmos, Azure, and Postgres all in the name and wouldn't compromise.

studmuffin650 868 days ago

AWS is putting up good fight

jabart 868 days ago

Figma uses AWS RDS, RDS doesn't list citus as a supported extension.

gen220 868 days ago

This is my guess of why they didn't use Citus. They weren't interested in the options of (1) going multi-cloud [DB in Azure Cosmos / Backend(s) in AWS] (2) going all-in on Azure [DB in Azure Cosmos / Backend(s) in Azure] (3) self-managing Postgres+Citus in EC2.

It'd be interesting to compare the expected capex of developing this in-house solution + the opex of maintaining it vs the same categories of expected costs for option (3) – because I imagine that's probably the most palatable option.

They also may have pre-paid for dedicated RDS instances for the next X years (before this horizontal scaling initiative began, to boot), as AWS allows companies to do this at a pretty steep discount rate, which would probably tilt them away from (3).

sgarland 868 days ago

Especially because Option 3 lets you go waaaay farther on vertical scaling, since you can get native NVMe drives (they mentioned hitting IOPS limits for RDS), more exotic instance classes with far more RAM, and do stuff like ZFS for native compression and snapshots.

iamdanieljohns 868 days ago

I would love to see a comparison of the major PostgresQL services such as Citus, EDB, Crunchy, Neon, and some OSS distributions/packages

_boffin_ 868 days ago

how "massive" is massive in your case?

dijit 868 days ago

I've had CitusDB running across 68 bare metal machines (40 vCPU, 768GiB ram, 20TiB of storage each + 40GiB network links) and it ran decently well.

Not sure what your definition of massive is, I think Spanner would easily beat it.

Also, it's very use-case dependent, you can't "just use" Citus for everything, it's not quite as flexible as a bog-standard pgsql install due to the way it's sharding, you have to be a tad more careful with your data model.

VHRanger 868 days ago

Is there a reason there's comparatively little storage in your machines in relation to RAM or even CPUs?

Do your machines do compute heavy loads or something?

For a DB I'd expect a lot more storage per node

dijit 868 days ago

NVMe SSDs aren't so large unfortunately.

a 1U server has capacity for 8 drives, we used 2 slots for the OS (RAID1), 2 slots for the WAL volume (2 slots) leaving only 4 slots in RAID10.

So I'm already cheating a little and claiming WAL storage was part of total storage.

VHRanger 868 days ago

Shouldn't you try to get something with PCIe bifurcation in this case?

I doubt you're saturating the PCIe bus bandwidth on any of them?

I imagine your DB is extremely high performance, though!

skunkworker 868 days ago

What is your definition of "decently well", and is your primary cluster (without replicas) above 1PB?

simonw 868 days ago

They said 20TiB * 68, which I think is 1.5PB.

skunkworker 867 days ago

That could be all of the nodes, or just the primaries without replicas.

adastral 868 days ago

Around ten heavily-updated (50-400k updated rows/min) tables ranging between 500M and 5B rows, with a couple tables over 40B rows each (5TB each IIRC).

gregors 868 days ago

Where's the fun in that? I'm not being snarky either. Maybe it's not the best decision business-wise, but I guarantee it was more challenging and more fun. There's something to be said for that.