Hacker News new | ask | show | jobs
by throwawaaarrgh 925 days ago
> Postgres sits at the heart of everything our systems do.

Did the people making these decisions never take Computer Science classes? Even a student taking a data structures module would realize this is a bad idea. There's actually more like two dozen different reasons it's a bad idea.

3 comments

Would be interested to hear more about your opinion on why using a database is a mistake.
Using a datastore for which true master-master HA is at best a bolted-on afterthought when you explicitly want a zero-downtime system is a mistake in a pretty obvious way.

Using a datastore with a black box query planner that explicitly doesn't allow you to force particular indices (using hints or similar) is a more subtle mistake but will inevitably bite you eventually. Likewise a datastore that uses black-box MVCC and doesn't let you separate e.g. writing data from updating indices.

I meant using a database for more than relational read-heavy data queries. I would need to write a small book. Tl;dr the data model, communication model, locking model, and operational model all have specific limitations designed around a specific use case and straying from that case invites problems that need workarounds that create more problems.
I hear you on that, and can say that Postgres is incredibly capable at going beyond typical relational database workloads. One example are durable queues that are transactionally consistent with the rest of the database play a unique role in our architecture that would otherwise require more ceremony. More details here: https://getoban.pro

We are also working on shifting some workloads off of Postgres on to more appropriate systems as we scale, like logging. But we intentionally chose to minimize dependencies by pushing Postgres further to move faster, with migration plans ready as we continue to reach new levels of scale (e.g. using a dedicated log storage solution like elastic search or clickhouse).

Is this a bit? The median CS undergrad has zero experience with large & successful software systems in the real world. Of course they wouldn't understand!
Yeah - in fact, this is probably a great example of stuff you don't learn in class that gets really clear in the real world:) Operational concerns trump a lot of other things, and shoving everything you can into 1 database technology is so much better to manage that it covers a lot of suboptimal fit.
What do you mean? I don’t understand, how is using a database an architectural mistake?
It's a mistake to use one specific computer science concept (RDBMS) to solve 50 different problems. They mentioned logging and scheduling, two things RDBMS are not designed for and have specific limitations around. From just a general architecture perspective it's literally a single point of failure and limitation for every single aspect of the system. And it's vendor specific, it's not like you can just plug plsql code into any other RDBMS and expect it to work. It's so obviously a bad idea it's hard to comprehend taking it seriously
It might not be good computer science to use one tool to solve 50 different problems; but it's not bad computer engineering to use one tool to solve 50 different problems that fit within its capabilities rather than using 50 different tools, all with their own operational expertise.

There's no need to have the best tool for every job. Although it's also important to be able to see when a many purpose tool is insufficient for a specific job as it exists in your system and then figure out what would be more appropriate.

You'd probably be surprised by how many systems are just Postgres/mysql + Redis.
For example, it's dead easy to make a high-capacity message queue by just using SELECT ... FOR UPDATE SKIP LOCKED with Postgres transactions, and I would argue it's more reliable than a lot of microservice-everything setups by way of having very few moving parts.
Classic NIH syndrome. "I made it myself so it must be better", when it's clear that a single sql query doesn't remotely approach a complete solution for scheduling. But the ignorant use it because they don't know better, until they too fall into the trap and realize they spent 10x as much engineering work to get something they could have just installed from the web and been done with. Every generation seems to fall into this trap with another tech stack.
It's all trade offs, right? Introducing a new component to your stack isn't free, it's paid for by more operational complexity. Maybe it's worth it, maybe it's not, but there is a calculation that needs to be made that's not just "NIH syndrome".
If you already have a DB (and essentially every app does), it can be far less effort with the same or greater reliability to create a queue table than to set up RabbitMQ, NATS, etc. As long as you tune the vacuuming appropriately, it will last for quite a lot of scale.

Source: am a DBRE, and have ran self-hosted RabbitMQ and NATS clusters.

Sure, so install RabbitMQ as well.

As the saying goes... "now you have two problems". :)

You could honestly just do in memory SQLite and use that lol idk that’s what I did because I wanted to quickly be able to handle thousands of simultaneous scheduling tasks.

Took like two hours and it works fine. Customers are happy. Event logs persist to s3 in case I need to replay (hasn’t happened once yet).