| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hoodoof 3411 days ago
	AWS needs something like this. The missing piece for the AWS serverless story is a database that is suitable for writing real world applications. DynamoDB is far from suitable for that task, which leaves AWS serverless with no good database.

5 comments

kiallmacinnes 3411 days ago

AWS has RDS - That's most certainly a database suitable for writing real world applications as its MySQL.

Does serverless somehow mandate a non SQL solution?

link

delta1 3411 days ago

RDS also supports PostgreSQL, SQL Server, Oracle, Aurora and MariaDB

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcom...

link

hoodoof 3411 days ago

RDS is server based - you need to pay to have an instance running per hour. That's not serverless. That's "serverful".

link

Lazare 3411 days ago

On the one hand, everything is server based at some level; it's just a question of how much is being hidden from you and managed by a third party.

On the other hand RDS hides a lot of the complexity from you. You don't have to pick an OS, apply updates, secure it, manage it, configure it, or patch it. There are some number of virtual servers out there that are nominally running your RDS cluster, but it's all pretty theoretical.

So I'm not entirely understanding your point.

> you need to pay to have an instance running per hour

You are paying to have instances running with every other DB service too; they may just break it out on your bill a bit differently. :)

The real issue with RDS for me isn't that they haven't removed the server part from the equation (they have), it's that they haven't removed the RDBMS from the equation. Schema changes, data migrations, replicas, sharding, scaling: All the hard parts of running a RDBMS are still there.

If Amazon could somehow make a magical service that accepted SQL queries and somehow returned my data, I'd be ecstatic - but the difference between that and RDS isn't the fact that they're letting me know how much ram the virtual server which is nominally running MySQL for me has.

link

kiallmacinnes 3411 days ago

I'm not sure how that differs from Azure Document DB? I have no inside info on this, but, I'm pretty sure it runs on a server too.. In the specific context of databases used for "serverless", clearly there are servers involved, it's simply that your application and ops team doesn't manage them.

What I'm getting at is, a hosted DB is a hosted DB.. What makes SQL unsuitable for serverless?

link

kiallmacinnes 3411 days ago

Replying to myself here, I missed a key point.. the issue you raise is that you're billed per hour, even when it's unused? That makes some amount of sense, but any data storage is going to come with a per hour bill - either for the instance of it, or the data within it.

Anyway, my bad, I now see your point :)

link

bpicolo 3411 days ago

It's a cloud service just like Dynamo, the implementation specifics seem irrelevant here.

Touting "serverless" as some sort of mysticism that doesn't really mean anything useful doesn't really get anybody anywhere.

link

cyberferret 3411 days ago

Yeah, I dabbled in DynamoDB for a recent project - couldn't really get my head around it - very strange sort of NoSQL database. The query language is incredibly arcane and wordy, and mostly inflexible.

Thinking of setting up an EC2 instance running RethinkDB or PouchDB for my project (and for future projects).

link

ZGF4 3411 days ago

Cross datacenter replication is the missing piece from AWS. I wish they'd just roll out a hosted Cassandra or something identical

link

jeffasinger 3411 days ago

While probably not what you're looking for if you're mentioning Cassandra, RDS does let you have read replicas in any region.

link

manigandham 3411 days ago

You can use scylladb.com and set it up pretty easily. Stable, distributed and fast out of the box with a lot less maintenance.

link

hayd 3411 days ago

> DynamoDB is far from suitable for that task

why?

link

steve918 3411 days ago

DynamoDB would be pretty close if it just allowed null values.

link

hoodoof 3411 days ago

DynamoDB is effectively useless for querying, except perhaps for some sort of highly specialised application able to fit within the DynamoDB strange and arcane query model.

What sort of database is effectively useless for querying?

Also they need to ditch the really, really confusiong and limiting scaling model. For a database that advertises scaling as one of its key strengths, DynamoDB sure has a bad scaling story.

link

phamilton 3411 days ago

> What sort of database is effectively useless for querying?

Cassandra, Riak, Voldemort, HBase, Bigtable, Azure Table Storage, and many other implementations of wide column stores have similarly limited querying.

I'm also not sure what you mean by the limiting scaling model. I can go from 0 to 160k reads/second by turning a knob, and 160k is only the default limit (you can request higher limits).

It is not a document store. It's a wide column store. Use it for the right job and it does very well. Treat it like postgres and you are gonna have a hard time.

link

tech2 3411 days ago

The price for that 160k is horrifying though, esp. if the requirement is bursty rather than continuous.

link

lostcolony 3411 days ago

Which is why you turn the knob back down when you stop being bursty.

But yes, it's pricy. It may not be the best fit for some. Hopefully by the time you're taking 160k writes per second you have a solid business model. I mean, Twitter peaked at around 8000 tweets per second. What are you doing that requires 160k, and do you really need to be storing it?

link

danek 3410 days ago

It's probably an indication that your use-case is not a good fit for dynamo, or that you didn't adapt your use-case to dynamo, you're doing something "wrong" like trying to use it as a relational database. I've experienced some of these pains as part of my dynamo learning curve.

For example by changing my query strategy I was able reduce the provisioned write units from 1900 to 150 (write units dominate the cost).

link

phamilton 3411 days ago

Ignoring reserved prices, it is $10.40/hr (these are eventually consistent reads, so half the cost of consistent ones). That puts it roughly on par with an RDS postgres r3.8xlarge instance with 10k provisioned IOPS.

Sure, you likely have more than one table on RDS, so that cost is amortized, but when you get to the scale where you need 160k reads/s, you aren't going to have much more than that one dataset in a single instance.

link

stephen123 3411 days ago

It works well for a CQRS model. Which helps with super high scale apps. But most devs want joins and dont want to take the discipline to manage the data duplication.

link

phamilton 3411 days ago

I just rolled out a feature on DynamoDB and when monitoring it, I look at one yeah. Provisioned capacity vs consumed capacity. That's all I have to care about. No CPU, RAM, disk space metrics. Usage can increase 4x and performance is flat. It's great.

The application is less flexible and required making a lot of decisions up front, but operationally it's fantastic.

link

positr0n 3411 days ago

For my application I have found it is more complex about provisioned vs consumed capacity. I get throttling all the time when consumed capacity is a third of provisioned capacity.

You also need to care about how DDB does its underlying partitioning. It would be nice to turn the knobs and be able to trust you will get X reads/sec and Y writes/sec, but that is only true per node! Unfortunately, DDB gives you zero information about how many nodes your DDB table is running on! (Yes you can guess pretty well if you keep track of your usage rate and do some math).

So when provisioning, you need to be aware that if you have 100 provisioned read ops, but you have data on 5 nodes, you really only have 20 reads/sec if one key gets hot.

I agree it's pretty easy operationally, but you can get burned if you don't know how it works under the hood.

link

phamilton 3411 days ago

I just ping support when I want to know partitions. They also told me a little trick. If you create a kinesis stream for your table, the number of shards in the stream is the number of partitions.

But you're right part of design for DDB is picking a proper partition key so you don't end up with hot shards.

link

ZGF4 3411 days ago

Databases in this category are some of the most popular ones in the world with good reason. The only way you can scale is to adopt a query-free architecture.

It feels tedious at first but once you develop some good habits and frameworks around denormalization it becomes easy to do that from day one.

link

g0del_was_wr0ng 3411 days ago

>> The only way you can scale is to adopt a query-free architecture

This is not really the case. There are database systems that can handle large scale and complex queries. Allthough usually at the price of providing reduced consistency guarantees.

link

steve918 3411 days ago

Actually I guess the query language and indexing is pretty limiting too.

link