Hacker News new | ask | show | jobs
by hoodoof 3411 days ago
DynamoDB is effectively useless for querying, except perhaps for some sort of highly specialised application able to fit within the DynamoDB strange and arcane query model.

What sort of database is effectively useless for querying?

Also they need to ditch the really, really confusiong and limiting scaling model. For a database that advertises scaling as one of its key strengths, DynamoDB sure has a bad scaling story.

3 comments

> What sort of database is effectively useless for querying?

Cassandra, Riak, Voldemort, HBase, Bigtable, Azure Table Storage, and many other implementations of wide column stores have similarly limited querying.

I'm also not sure what you mean by the limiting scaling model. I can go from 0 to 160k reads/second by turning a knob, and 160k is only the default limit (you can request higher limits).

It is not a document store. It's a wide column store. Use it for the right job and it does very well. Treat it like postgres and you are gonna have a hard time.

The price for that 160k is horrifying though, esp. if the requirement is bursty rather than continuous.
Which is why you turn the knob back down when you stop being bursty.

But yes, it's pricy. It may not be the best fit for some. Hopefully by the time you're taking 160k writes per second you have a solid business model. I mean, Twitter peaked at around 8000 tweets per second. What are you doing that requires 160k, and do you really need to be storing it?

It's probably an indication that your use-case is not a good fit for dynamo, or that you didn't adapt your use-case to dynamo, you're doing something "wrong" like trying to use it as a relational database. I've experienced some of these pains as part of my dynamo learning curve.

For example by changing my query strategy I was able reduce the provisioned write units from 1900 to 150 (write units dominate the cost).

Ignoring reserved prices, it is $10.40/hr (these are eventually consistent reads, so half the cost of consistent ones). That puts it roughly on par with an RDS postgres r3.8xlarge instance with 10k provisioned IOPS.

Sure, you likely have more than one table on RDS, so that cost is amortized, but when you get to the scale where you need 160k reads/s, you aren't going to have much more than that one dataset in a single instance.

It works well for a CQRS model. Which helps with super high scale apps. But most devs want joins and dont want to take the discipline to manage the data duplication.
I just rolled out a feature on DynamoDB and when monitoring it, I look at one yeah. Provisioned capacity vs consumed capacity. That's all I have to care about. No CPU, RAM, disk space metrics. Usage can increase 4x and performance is flat. It's great.

The application is less flexible and required making a lot of decisions up front, but operationally it's fantastic.

For my application I have found it is more complex about provisioned vs consumed capacity. I get throttling all the time when consumed capacity is a third of provisioned capacity.

You also need to care about how DDB does its underlying partitioning. It would be nice to turn the knobs and be able to trust you will get X reads/sec and Y writes/sec, but that is only true per node! Unfortunately, DDB gives you zero information about how many nodes your DDB table is running on! (Yes you can guess pretty well if you keep track of your usage rate and do some math).

So when provisioning, you need to be aware that if you have 100 provisioned read ops, but you have data on 5 nodes, you really only have 20 reads/sec if one key gets hot.

I agree it's pretty easy operationally, but you can get burned if you don't know how it works under the hood.

I just ping support when I want to know partitions. They also told me a little trick. If you create a kinesis stream for your table, the number of shards in the stream is the number of partitions.

But you're right part of design for DDB is picking a proper partition key so you don't end up with hot shards.

Databases in this category are some of the most popular ones in the world with good reason. The only way you can scale is to adopt a query-free architecture.

It feels tedious at first but once you develop some good habits and frameworks around denormalization it becomes easy to do that from day one.

>> The only way you can scale is to adopt a query-free architecture

This is not really the case. There are database systems that can handle large scale and complex queries. Allthough usually at the price of providing reduced consistency guarantees.