Hacker News new | ask | show | jobs
by bistablesulphur 1439 days ago
I'be been working with DynamoDB daily for a few years now, and whilst I like working with it and the specific scenario it solves for us, I'd still urge anyone thinking about using it to carefully reconsider whether their problem is truly unique enough that a traditional RDBMS couldn't handle it with some tuning. Theycan be unbelievably performant and give so much stuff for free.

Designing application specifically for DynamoDB will take _a lot_ of time and effort. I think we could have saved almost a third of our entire development time had we used more of the boring stuff.

12 comments

"I'd still urge anyone thinking about using it to carefully reconsider whether their problem is truly unique enough that a traditional RDBMS couldn't handle it with some tuning."

Lately, the problem I've seen is people who haven't even considered whether their problem is truly unique enough that a traditional RDBMS couldn't handle it without some tuning. (Here I don't count "set up the obvious index" as "tuning", because if you're using a non-RDBMS the same work is encompassed in figuring out what to use as keys. No escaping that one regardless of technology.)

I'm losing track of the number of teams in my company I've seen switching databases after they rolled to production because it turns out they picked a database that doesn't support the primary access pattern for their data in some cases, or in other cases, a very common secondary access pattern. In all the cases I've seen so far, it's been for quantities of data that an RDMBS would have chewed up and spat out without even noticing. It's amazing how much trouble you can get yourself into with non-relational databases with just a few hundred megabytes of data, or even a few tens of megabytes of data if you fall particularly hard for the "it's fast and easy!" hype too hard and end up accidentally writing a pessimal schema because you thought using a non-relational database meant you got to think less about your schema than a relational DB.

That is precisely backwards; NoSQL-type DBs get their power from you spending a lot more time and care in thinking about exactly how you plan on accessing data. Many NoSQL databases loosen the constraints on what you can store in a given record, but in return they are a great deal more fussy about how you access records. If you want to skip careful design of how you access records, you want the relational DB. And nowadays, tossing a JSON field into a relational row is quite cheap and effective for those "catch alls" in the schema.

There's some interesting hybrids out there now if you want a bit of both worlds. For instance, Clickhouse is not an SQL database, but it more gracefully handles a lot of SQL-esque workloads than many other NoSQL-esque databases. You can get much farther with "I need a NoSQL-style database, but every once in a while I need an SQL-like bit of functionality", than you can in something like Cassandra.

+1

Discovered this while building https://github.com/plutomi/plutomi as I was enamored by Rick's talks and guarantees of `performance at any scale`. In reality, Dynamo was solving scaling issues that we didn't have and the amount of times I've had to rework something to get around some of the quirks of Dynamo led to a lot of lost dev time.

Now that the project is getting more complex, doing simple things such as "searching" (for our use case) are virtually impossible without hosting an ElasticSearch cluster where a simple like %email% in postgres would have sufficed.

Not saying it's a bad DB at all, but you really need to know your access patterns and plan accordingly. Dynamo streams are a godsend and combined with EventBridge you can do some powerful things for asynchronous events. Not paying while it's not running with on demand is awesome, and the performance is truly off the charts. Just please know what you are getting into. In fact, I'd recommend only using Dynamo if you are migrating a "finished" app vs using it for apps that are still evolving

I think there are good reasons to choose DynamoDB over a RDBMS that have nothing to do with scalability.

I've used DynamoDB several times over the past several years in the context of providing a datastore for a microservice. In all cases it was cheaper and easier than RDS, and the ability to add GSIs has enabled me to adapt to all of the new access patterns I've had to deal with.

For us, DynamoDB has become a 'boring' option.

I think it also depends on the system you’re using it on. I think one of the biggest advantages of DDB is that is scales so well (with good design to avoid hot partitions). Afaik, RDBMS simply cannot scale in the same way due to their design. Yes, they can scale somewhat, but as you said it requires lots of tuning, and you’ll still reach a hardish limit.
One partition of DDB is incredibly tiny compared to one partition of an RDBMS. You can push that one partition of RDBMS pretty far before you're forced to design sharding into your system. With DDB you are basically forced to design sharding into your partition keys up front or you will have hot partition issues. This is by far the most common problem I see with teams using DDB, so brushing it off as "with good design to avoid hot partitions" is understating the scope of the problem.
All databases scale the same way - by partitioning and sharding the dataspace. RDBMS have harder restrictions due to the features they provide and the performance expectations, but you can just as easily use a bunch of relational servers to partition a table (or several) across them by range or hashes of the primary key.

That's basically what key/value stores like DynamoDB do, and why DynamoDB was even built on MySQL (at least originally).

"can just as easily use a bunch of relational servers to partition a table" is not true at all. Managing, maintaining and tuning a sharded relational cluster is an astonishing amount of operations work. partition management, re-partioning, partition failover / promotions / demotions, query routing, shard discovery, upgrades... it goes on an on. All this work is gone if you pick dynamo. Not saying that dynamo is always better, but IMHO people very much underestimate the ops cost of running a sharded relational cluster at scale.
The point is the scaling fundamentals are the same across databases.

Whether that work is managed or not is a different topic, and you can find plenty of managed offerings of scale-out relational databases.

"just as easily" would be the contested part, I'd guess
> Designing application specifically for DynamoDB will take _a lot_ of time and effort

Disagree with this. Your team could think of it as a document database, and you can have utility libraries that filter and sort based on PK / SK combinations to provide a seamless experience.

If you want your DynamoDB table to scale well you'll have to put in a lot of upfront effort.
> give so much stuff for free

Interesting choice of words. Performance wise, sure. Money wise? I'm still waiting for a SQL database with pay-per-request pricing. The cost difference is enormous, particularly when you remember that you don't need to spend manpower managing the underlying hardware.

Engineering tradeoffs are more complicated than only considering raw scalability performance and "I can run it myself on a cheap Raspberry Pi".

>Interesting choice of words. Performance wise, sure. Money wise? I'm still waiting for a SQL database with pay-per-request pricing. The cost difference is enormous, particularly when you remember that you don't need to spend manpower managing the underlying hardware.

I assume you're saying DynamoDB is less expensive than SQL because of pay-per-request.

Working on applications with a modest amount of data (a few TB over a few years) pay per request has been incredibly expensive even with scaled provisioning. I would much rather have an SQL database and pay for the server/s. Then I could afford a few more developers!

Have you looked at Planetscale?
Who manages hardware these days? Aurora works quite well.
Is there a specific reason why you say "Designing application specifically for DynamoDB will take _a lot_ of time and effort". Are you talking about migrating from RDBMS to DynamoDB? Coz, my experience with DynamoDB designing was very similar to any other NoSQL DB.
You really need to consider you access patterns up front with DynamoDB. Any changes of those during application development can be very time consuming. There are limitations on how many local and global secondary indexes you can have. You also can’t easily add them to existing tables. However, you can use multiple databases to get the best of both worlds. At my employer, we typically store domain entities in DynamoDB as the source of truth. However, we replicate some entities to secondary databases like OpenSearch when we have access patterns that require adhoc querying.
A lot is transferrable to NoSQL and key-value in general, though DDB has plenty of quirks of it's own. Understanding your problem really is the key. A lot of problems turn out to be quite relational after all

You definitely can build just about anything with DDB, it's often just not worth the time when most can be solved by existing tools

I mean if you go the single table route… https://aws.amazon.com/blogs/compute/creating-a-single-table...
To be fair, you can end up spending a lot of time on the boring stuff as well.
Could you elaborate on your (or a hypothetical) use case where dynamo db makes sense? I for one can never come up with something better served by rdbms or s3.
I'll give you two use cases that I use for DynamoDB, where otherwise I'm primarily a MySQL shop

1) Simple: I have a system that constantly records and stores 30 minute MP3 files of audio streams (1000's of them) in S3. We write the referencing metadata to a table in DynamoDB where users can query by date/time. Given the sheer amount of items (hundreds of millions), we saw far worse performance vs. cost on MySQL vs Dynamo.

2) Complex: I have a system that ingests thousands of tiny MP3 files a minute into S3 and writes the associated metadata to DynamoDB. DynamoDB then has a stream associated with it that runs a lambda to consolidate statistics to another table and stream that metadata to clients via other lambdas or data streams.

Those are two great use cases where we saw better usage patterns with Dynamo vs MySQL.

When you require point and range queries. For example, given a cart-id, fetch the skus; given a authz-token, fetch scopes; given a user-id and a time-range, fetch a list of pending order-ids.

There's a lot more you could do though, DynamoDB, is after all, a wide-column KV store. Ref this re:invent talk from 2018: https://www.youtube-nocookie.com/embed/HaEPXoXVf2k

Apart from being fully-managed, the key selling points of DynamoDB are its consistent performance for a given query type, read-your-writes consistency semantics, auto-replication, auto-disaster recovery.

See also: https://martinfowler.com/bliki/AggregateOrientedDatabase.htm... (mirror: https://archive.is/lc2eO)

The aws reinvent lecture was great and answered exactly when to use dynamodb. I might seriously consider it for some of my applications for sure.
I always tell people there are two clear areas where DynamoDB has some major benefits:

- Very high scale applications that can be tough for an RDBMS to handle

- Serverless applications (e.g. w/ AWS Lambda) due to how the connection model (and other factors) work better with that model.

Then, for about 80% of OLTP applications, you can choose either DynamoDB or RDBMS, and it really comes down to which tradeoffs you prefer.

DynamoDB will give you consistent, predictable performance basically forever, and there's not the long-term maintenance drag of tuning your database as your usage grows. The downside, as others have mentioned, is more planning upfront and some loss of flexibility.

Lots of records (Billions), low/no relational linkage, The need to query/update records in different ways (IE, you need indexes), The need for HA and scaling (IE, perhaps you can be VERY bursty and read heavy).

It's not one size fits all, but at least in my line of work there are few instances where it's a pretty good fit.

If you have a database access layer then structuring your application shouldn't be that different. I wouldn't deal with the database directly unless I had a really good reason or the abstraction layer didn't support the query I was trying to run.
DynamoDB is amazing, but not very flexible once you have designed your database. No abstraction layer will allow you to run queries ad-hoc in a performant way.
It’s true. 400kb max item size, too. 1mb max query size I believe. Good luck grabbing a shit load of data at once without a parallel scan.

Dynamo is a precision tool and it’s great at those specific workloads but it’s not a one size fits all by any means.

400kb is the max item size, the pattern to get around that is to store objects in S3 and URLs/keys to those objects in DDB
> No abstraction layer will allow you to run queries ad-hoc in a performant way.

Depends on the size of the data. Run analytics queries (i.e. things that return summary data not all rows) on 10GB of data through clickhouse or duckdb or datafusion and they'll generally return in milliseconds.

What does this have to do with DynamoDB? The point is that once you've gotten your data into DynamoDB, you're strongly limited in how you can use it until you load it into something else.
I didn't see an obvious connection between the two sentences.
An access layer doesn't change your access patterns, which is what actually determines the database model to use.

DynamoDB (and other similar key/value stores) make very big trade-offs for speed and scale that most applications don't need.

DynamoDB usage is heavily based around correctly structuring your keys. Allowing you to do things like query sub-sets easily. This in turn means you need to know what your usage patterns will be like so you can correctly structure your keys.

God help you if you need to make major changes to this down the road.

Database Access Layer can't do this for you, that just isn't what they do.

totally, or s3
Designing application specifically for DynamoDB will take _a lot_ of time and effort.

If you can write, read, and query a JSON document using an API in your application, it's literally that simple.

The only real time and effort is the architectural decisions you make up front, and that's about it. And there are some great guides out there that cover 99% of those architectural decisions.

As a user of both, I find MySQL replication and clusters to be far more complex and time and effort intensive.

Have to disagree on this one. Something as basic and out of the box as a migration / data backfill is not only complicated but also very expensive (both time and cost wise) on Dynamo. Not to mention all the other things that come nicely with an relational db (type checking, auto increments, uniform data)
To be fair, the parent discusses designing an application to use Dynamo, not data migration.

I'll completely agree with you on migration / backfill. You're going to pay a lot of money to migrate a ton of data into Dynamo, and you'll also definitely increase the complexity in provisioning and setting up that migration pattern.

But my comment stands pretty well considering greefield application development around Dynamo.

> If you can write, read, and query a JSON document using an API in your application, it's literally that simple

You could say that of Elasticsearch or Mongo, too. And it might be technically true, but you haven't scratched the surface of mappings, design, limitations, etc.

You can dump a bunch of data into Dynamo very easily, but what about getting data via secondary indices when you can't get your data with the views you've built without scanning? How do you use partition keys in it? And so on.

> The only real time and effort is the architectural decisions you make up front, and that's about it

And dont forget about the time spent fixing what could have been caught by types and regular old db constraints (for most applications)

It’s a question of change resilience. You can implement crud on a single object with ddb trivially. You can’t implement 5 different list by X property apis trivially, or filter the objects, or deal with foreign keys…