| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by abalone 2763 days ago

Any experience with using Aurora in place of DynamoDB?

A couple years ago there was an interesting tidbit at re:Invent about customers moving from DynamoDB to Aurora to save significant costs.[1] The Aurora team made the point that DynamoDB suffers from hotspots despite your best efforts to evenly distribute keys, so you end up overprovisioning. Whereas with Aurora you just pay for I/O. And the scalability is great. Plus you get other nice stuff with Aurora like, you know, traditional SQL multi-operation transactions.

It was kind of buried in a preso from the Aurora team and the high-level messaging from Amazon was still, NoSQL is the most scalable thing. Aurora was and is still seemingly positioned against other solutions within the SQL realm. I sort of get it in theory that NoSQL is still theoretically infinitely scalable whereas Aurora is bounded by 15 read replicas and one write master.. but in practice these days those limits are huge. I think one write master can handle like 100K transactions a second or something.

So, I'm really curious where this has gone in the past couple years if anywhere. Is NoSQL still the best approach?

[1] https://youtu.be/60QumD2QsF0?t=1021

6 comments

awinder 2763 days ago

https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...

abalone 2763 days ago

Oh cool. For those reading along this is titled "How Amazon DynamoDB adaptive capacity accommodates uneven data access patterns (or, why what you know about DynamoDB might be outdated)". Is this a new feature?

awinder 2763 days ago

Yeah I should have elaborated a bit. I believe adaptive capacity was announced at re:invent in 2017 and may have released shortly after / maybe early 2018. The feature is getting a lot more press & push from AWS lately though for sure.

kenhwang 2762 days ago

I remember having a conversation with our AWS rep about 2 years ago during our quarterly feature request meeting. I remember asking for DynamoDB autoscaling and burst capacity; pretty happy they finally delivered.

Since then we've pretty much cut our DynamoDB bill in half and had a drastic reduction in throttled responses.

coder543 2763 days ago

I personally recommend using a SQL database until you're absolutely positively sure you don't need one, for many reasons.

But, as far as the "you end up overprovisioning" because of hotspots thing, DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches and save you money compared to the provisioning you would have done with DynamoDB, from what I understand.

orthecreedence 2763 days ago

We use a hybrid. We process a lot of incoming data and dump most of it into dynamo (it's ephemeral so the TTL feature is nice) and if we get capacity errors (Dynamo takes a while to scale up sometimes) we just dump our objects in the DB. The end result is we keep a huge amount of writes off our DB for processing incoming largish objects. The amount of data it stores would cost an arm and a leg to put into redis.

Granted, I don't think I'd want to use Dynamo for anything other than temporary data. Lock-in makes me nervous, and the way it scales up/down really makes it difficult to use it for hourly workloads...by the time it scales up we're close to done needing more capacity, then it doesn't scale down for like 40m after. We set up caps and the DB overflow machanism keeps things from grinding to a halt.

abalone 2762 days ago

Why don't you use Kinesis for this? Isn't that what it's made for?

abalone 2763 days ago

> DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches

The problem they noted isn't lack of autoscaling, it's that you have to provision the entire datastore to accommodate your hottest partition.

paragraft 2763 days ago

GP used the wrong term, think they meant adaptive capacity, which is a newer feature where shards will automatically lend capacity to each other in the case of hotspots.

piinbinary 2763 days ago

Autoscaling doesn't always help with hot shards (which I think gp was referring to) because you can have a single shard go over its share of the throughput[0] while still having a low total throughput.

[0] total throughput/num shards

EwanToo 2762 days ago

This has largely been resolved, a single shard can now consume more of the throughput than your equation would give you. AWS refer to it as Adaptive Capacity

https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...

manigandham 2763 days ago

Yes. Relational databases are very fast and using them as key/value stores is a great use-case. Using a scale-out system like Aurora makes it even better. It's slower because of SQL parsing and generally the SQL clients are not as fast, but you can get close to single-digit millisecond latency these days.

We use Aurora or Postgres for key/value unless we need something specific, like multi-regional capacity or really high-end performance. For that we run ScyllaDB.

ngrilly 2762 days ago

> It's slower because of SQL parsing and generally the SQL clients are not as fast

I'd be really surprised if the client library introduces a latency significant enough to be compared to the network latency between the app server and the database server.

manigandham 2762 days ago

Many libraries handle db connections poorly, or have heavy-handled pooling systems, or aren't fully async, all of which limits total throughput. The key/value clients usually have a much simpler APIs like HTTP which scale much better.

ngrilly 2762 days ago

I don't understand. What makes you think it's easier for NoSQL clients (versus SQL clients) to correctly implement connection pooling and async networking? For example, MongoDB and Cassandra wire protocols are not based on HTTP. And even if they were based on HTTP, connection pooling and async networking still requires a specific effort. Which libraries are you thinking of (as examples of good and bad behavior)?

manigandham 2762 days ago

Relational databases tend to have bigger and more complicated protocols, with more complex session management, data types and parsing requirements, and connections that may only support a single in-flight query.

Libraries just have to do more work, compared to simpler protocols, or HTTP which is incredibly easy to scale and pretty much handled automatically by the standard libraries at this point.

ddorian43 2762 days ago

Example: psycopg2 (python-postgresql driver) doesn't have (or sucks) prepared statements compared to cassandra driver.

ngrilly 2762 days ago

Right, but that has nothing to do with connection pooling and sync. And there is no structural reason that makes easier to implemented prepared statements for PostgreSQL than for Cassandra. It's anecdotal evidence.

ahoka 2762 days ago

I have the exact same experience with npgsql. It's exporting postgres's "one session - one server process" model which is very outdated.

scarface74 2763 days ago

Whether NoSQL is the best approach and whether DynamoDB is the best approach are two separate issues. I find DynamoDB too limiting with the way that it handles indexing, read and write capacity, etc. compared to traditional NoSQL databases like ElasticSearch and Mongo.

That being said, one advantage of DynamoDB is that it is API based and you can make a true serverless web app where all of the logic is on the client, you use Web Federation for authentication to DynamoDB, and you host your JavaScript files, html and CSS on S3.

Another advantage until two days ago, was that with most of the data stores on AWS, you kept your databases behind a VPC and if you used lambda, your lambda also had to be in a VPC and that increased warm up time for the lambda.

Now, there is the Read Only Data API for serverless Aurora. You don’t have to worry about the traditional connection pooling or being in a VPC.

wahnfrieden 2762 days ago

You can write too - not just read-only.

mdani 2763 days ago

Aurora did not work well for us (it was using local ephemeral disk to do sorts so our query results were truncated / limited to largest local storage) so the best option for us was to run MySQL or PostGres on a i3 instance with local SSDs.

abalone 2763 days ago

Ok but I'm not sure this is relevant. We're talking about using Aurora in place of DynamoDB, not how it compares to other SQL DBs. With DynamoDB the kind of internal sort you're talking about isn't even possible, right?

rawoke083600 2762 days ago

"Plus you get other nice stuff with Aurora like, you know, traditional SQL multi-operation transactions." THIS !!!!

NoSQL has such a nich usage!