Hacker News new | ask | show | jobs
by coder543 2760 days ago
I personally recommend using a SQL database until you're absolutely positively sure you don't need one, for many reasons.

But, as far as the "you end up overprovisioning" because of hotspots thing, DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches and save you money compared to the provisioning you would have done with DynamoDB, from what I understand.

3 comments

We use a hybrid. We process a lot of incoming data and dump most of it into dynamo (it's ephemeral so the TTL feature is nice) and if we get capacity errors (Dynamo takes a while to scale up sometimes) we just dump our objects in the DB. The end result is we keep a huge amount of writes off our DB for processing incoming largish objects. The amount of data it stores would cost an arm and a leg to put into redis.

Granted, I don't think I'd want to use Dynamo for anything other than temporary data. Lock-in makes me nervous, and the way it scales up/down really makes it difficult to use it for hourly workloads...by the time it scales up we're close to done needing more capacity, then it doesn't scale down for like 40m after. We set up caps and the DB overflow machanism keeps things from grinding to a halt.

Why don't you use Kinesis for this? Isn't that what it's made for?
> DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches

The problem they noted isn't lack of autoscaling, it's that you have to provision the entire datastore to accommodate your hottest partition.

GP used the wrong term, think they meant adaptive capacity, which is a newer feature where shards will automatically lend capacity to each other in the case of hotspots.
Autoscaling doesn't always help with hot shards (which I think gp was referring to) because you can have a single shard go over its share of the throughput[0] while still having a low total throughput.

[0] total throughput/num shards

This has largely been resolved, a single shard can now consume more of the throughput than your equation would give you. AWS refer to it as Adaptive Capacity

https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...