Hacker News new | ask | show | jobs
by jayd16 2263 days ago
At what point do these auto-sharding databases like DynamoDB become worth the effort these days? You can squeeze a lot out of a single Postgres instance and much more if you go with read replicas or Redis caches.

When you start with a relational model you don't need a priori knowledge of your data access and you get solid performance and guarantees. If you need this access knowledge beforehand, is DynamoDB best for scaling mature products?

6 comments

I just answered this on Twitter, but I think there are two instances where it's a no-brainer to use DynamoDB:

- High-scale situations where you're worried about performance of a relational database, particularly joins, as it scales.

- If you're using serverless compute (e.g. AWS Lambda or AppSync) where traditional databases don't fit well with the connection model.

That said, you can use DynamoDB for almost every OLTP application. It's just more a matter of personal preference as to whether you want to use a relational database or something like DynamoDB. I pick DynamoDB every time b/c I understand how to use it and like the other benefits (billing model, permissions model, performance characteristics), but I won't say you're wrong if you don't choose it in these other situations.

Does the connection model problem go away when using serverless rds? https://aws.amazon.com/rds/aurora/serverless/
I don't know about aurora serverless.

But aws offers a proxy exactly for this purpose.

https://aws.amazon.com/rds/proxy/

Very interesting. Thank you for sharing.
Dynamo is sort of in-between Redis and SQL:

- Less maintenance around schema/migrations

- Data types and validation

- You still get queries (though not to the level of SQL complexity)

- You still get indexes

- You get row-level TTL's like Redis

- Hosted / infinite scale

- Billing based on storage/throughput, not fixed instance sizes

> Less maintenance around schema/migrations

I would say there is much more maintenance, around schema and migrations. Since there is no enforcement of schema at the database level, you need to be very careful in understanding every single way your application(s) work with the data, and ensure that they are backwards and forward compatible. This generally involves writing a lot of custom tooling batch migration logic and ensuring strict control over code that modifies data.

It's very easy to discover schema migration problems in production as the data is accessed.

My rule of thumb has become:

If you know all your access pattern and your writes >>> reads, a NoSQL solution will be cheaper to operate than Postgres. Meaning, I believe, for most deployments, you can get the same amount of performance from postgres, but simply at a higher cost (which may be 3-6x at most). Another reason to go with NoSQL is if you are latency sensitive, although I don't think Dynamo falls in this bucket.

NoSQL was also really good for OLAP, but I think now there are several really good OLAP solutions (like Clickhouse for OSS and Redshift/BigQuery in the cloud) that are easier to manage.

What happens when you need to restart that “single Postgres instance” to apply config changes or upgrade to a more powerful instance class? How do you promote a replica to primary without downtime?

Those concerns are mostly gone when you rely on a service like DynamoDB. It's not “free”, it comes with increased complexity at the app level, but it does offer a piece of mind if you can afford the $$$.

AWS offers a managed Postgres service, too: https://aws.amazon.com/rds/postgresql/

(I'm a fan of DynamoDB and think there are many good use cases for it. Just saying that the above comparison doesn't seem relevant here.)

In my experience (using Dynamo for 4-5 use cases while Postgres for many more) a properly configured HA Postgres with RDS costs way more than Dynamo for the same workload. There is only a big catch, data modeling is not as straightforward with Dynamo. You cannot change it up so easily than with Postgres and your access patterns must cover every single use case. If you can do that, model your data and the access pattern fits Dynamo great, if not you are going to have a hard time.
Managed Postgres still needs to be restarted for config changes and instance resizes for example.
Managed postgres services tend to be fairly expensive for production usecases at any small - medium organization, and they all come with their own little caveats here and there.
Are you running one cluster per application? I've seen that a lot but it's actually quite inexpensive when you use one cluster for multiple applications.
Check out this video from AWS's Principal Dynamodb expert that touches on comparisons against relation db's: https://www.youtube.com/watch?v=HaEPXoXVf2k
I can set up a DynamoDB based serverless API that scales automatically in a minute. I wouldn't call this an effort.

It's just that you know Postgres and I know DynamoDB. So go with what you know :)