Hacker News new | ask | show | jobs
by ak39 2312 days ago
Most (if not all) modern RDBMSs allow for reliable horizontal and vertical scaling. SQL Server (the db you mention) is no slouch when paired with additional CPUs (for vertical linear scaling) and allows for transactionally safe replication to distributed clusters of SQL Servers (for horizontal scaling) in geographically distributed servers. PostgreSQL has this and it's absolutely free.

Where is this meme that RDBMSs do not scale coming from? (I see lots of people saying this as though it is a given. Has there been any published data on this for me to read?).

8 comments

> Where is this meme that RDBMSs do not scale coming from?

They don't scale to Google or Facebook operational sizes. Once you get to a billion customers or so the ol' RDBMS tends to struggle. Because everyone wants to be Google they imagine they have Google's problems. I've been in a meeting where the client was talking about their severe scaling issues for their "big data" which could only possibly be resolved by state of the art cloud solutions. I pressed them on the numbers - they had 400GB. You can buy an iPhone with 512GB.

Oracle's Exadata X8-2 is ~1PB per full rack.

> They don't scale to Google or Facebook operational sizes.

You may or may not know this, but the primary datastore used at both Google and FB is MySQL. Sure, they use replication and sharding, but I would strongly argue that MySQL with sharding scales better than some multi master NoSQL thing like Cassandra.

Related, you should check out https://github.com/vitessio/vitess if you haven't seen it. It's what Youtube and others use for their primary data store in production.

> You may or may not know this

I didn't, but they do, surprising!

https://www.facebook.com/notes/facebook-engineering/tao-the-...

> but the primary datastore used at both Google and FB is MySQL

I'm almost certain this isn't true, BigTable and Spanner are much more widely used at Google because.... well, MySql doesn't really scale.

I think another reason for this, is that companies had a tendency to only have one single running database, to store everything, and often even in the same schema. Which ended up not scaling well in the organisation when so many unrelated topics/project/features went through one and the same database. I don't know if it has anything to do with licensing and paying per instance, maybe, but I feel that a lot of those problems of a database that doesn't scale, could have been solved by simply using a few databases for isolated domains instead of cramming everything into one.
> Where is this meme that RDBMSs do not scale coming from? (I see lots of people saying this as though it is a given. Has there been any published data on this for me to read?).

It mostly comes from people who don't know how and when to create an index.

I feel like you should understand how a relational database works before using them, especially if it's at the scale that you're running into "CPU limits"
> Where is this meme that RDBMSs do not scale coming from?

It's a holdover from the heyday of noSQL. Was more true then, but RDBMSs have caught up. (At least some of them.) There's even Dqlite for SQLight!

> Where is this meme that RDBMSs do not scale coming from?

In my days it was more like "You can't afford RDBMSs at that scale". Reach a certain point and Larry gets a new yacht. Cheaper/open source offerings have moved that goal post by quite a bit.

Although it sometimes scares me how them poor databases get treated when performance is dropping. Little Jimmy JOIN is the first one to be put down, often way before there's a need for it.

> SQL Server (the db you mention) is no slouch when paired with additional CPUs (for vertical linear scaling) and allows for transactionally safe replication to distributed clusters of SQL Servers

Additional CPUs are $7,000 USD per core, and replication is labor intensive. Transactional replication has a nasty habit of breaking as the source tables are changed, and Availability Groups have a ton of bugs (as evidenced by any recent Cumulative Update.)

Saying that SQL Server scales is like saying your wallet scales to hold any amount of money. Sure, it might, but it’s up to you to put the coin in, and it’s gonna for a lot of coin - compared to scaling out app servers, who have generally near no license cost, and code is synchronized at deploy time.

As to recommending you a place to read, I hate to say this, but you could start with my blog. Pretty much every week, I’ve got real life examples turned into abstract tutorials on there from companies who hit scaling walls and had to hire me for help. (Past client examples: Stack Overflow, Google.)

> Additional CPUs are $7,000 USD per core, and replication is labor intensive

Probably still cheaper than trying to impement scalable transactions in higher layers.

> Probably still cheaper than trying to impement scalable transactions in higher layers.

Transactions, yes, I totally agree. That's what databases are for: reliably storing and retrieving data. It's where you start doing domain logic that things get tougher, like (and I wish I was joking) calling cross-continent web services from the database and building HTML inside SQL.

> PostgreSQL has this and it's absolutely free.

Literally the second sentence ...

It's certainly possible, but more complex than just putting up a bunch of stateless application servers all accessing the same database. With no local cache. Then saying the DB is slow ;-)

It's also the kind of thing many developers don't like doing: thinking about operational concerns.

As an aside, horizontal scaling of sorts can be achieved by using microservices, it's actually one of the few really valid reasons for this type of architecture. If the microservice databases are not independent, you're doing it wrong.

> Where is this meme that RDBMSs do not scale coming from?

MongoDB marketing?