| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yatsyk 2306 days ago

> I’m not clear what you mean by limiting "not only particular document but database".

I’ve limited document size to 10mb and ratelimited updates to 10 per second. Client starts to update document with random data 10 requests per second. As far as I understand couch stores all versions at least some time. This means that this one client could fill space on my server 100mb/s. There is no such issues with postgress, and no one allow clients execute raw queries on database without any application server. Document only 10mb but database is huge.

> What kind of "expensive" query are you envisioning?

I have never used couch, so I don’t know what could be expensive. May be some lookup without index or something like this.

Sorry for my ignorance, is it true that if I limit couch only to replication it will not be any not indexed lookups?

Looks like implement secure system with couch is very hard but I can’t find any best practices, mostly only authentication and basic validation.

1 comments

Volundr 2306 days ago

> I’ve limited document size to 10mb and rate-limited updates to 10 per second. Client starts to update document with random data 10 requests per second. As far as I understand couch stores all versions at least some time. This means that this one client could fill space on my server 100mb/s. There is no such issues with PostgreSQL, and no one allow clients execute raw queries on database without any application server. Document only 10mb but database is huge.

Ah! Now we are getting somewhere! Your concerned about someone filling your disk.

OK, let's modify your scenario a little. Instead of updating an existing document, they create a new document. This a malicious client, why do updates that'll get cleaned up in a few minutes when I can make it permanent?

So, CouchDB allows these writes, and now your disk is full.

What does Postgres with a custom API do? Allows these writes, and now your disk is full.

Your allowing 10MB documents because that makes sense for your application right? So your Postgres table is going to have a binary column or some other column meant to hold bulk data, and your API is going to accept it.

If it doesn't make sense, lower the max document size. Apply validations to limit what fields can be written to, and how big they can be. In Postgres this is called your "schema". Couch being "schemaless", it's now your validation function. Couch is no different from any other schemaless database such as Mongo, RethinkDB and FoundationDB in this regard.

Also your rate limiting here is weak. If I can post to your sever at 100Mb/s second, I can saturate a 1GB link with only 10 clients. Doesn't matter if you reject my posts, if I can send them to the server, I can DOS you pretty easily.

The main thing Postgres gives you here is that it requires you to define your schema upfront (unless you use JSON columns, in which case it joins the schemaless club above). Couch will happily let you not, in which case someone wants to write a record of their car maintenance into your recipe book app? Couch is good with that. But take a step back. what actually stops them from putting that in the "description" column of your Postgres recipe app? Not much. So you have to think about what's important. Do I actually need to make sure these are all the same "shape"? If so I need a validation function. If I can just shrug and say "garbage in, garbage out", then I just need controls around how much data they can insert, but hey, I needed that for Postgres anyway.

> Sorry for my ignorance, is it true that if I limit couch only to replication it will not be any not indexed lookups?

Correct (enough). The entirety of CouchDB is built around efficient replication. While it's not going to use a formal "index" getting all of the changes after a specific rev is an efficient operation.

link

yatsyk 2306 days ago

It’s trivial to limit number of created documents in postgres, couchdb or application server though validation, I’m talking about updating document not creating new. In posgres if I update 1mb document used space will not always grow. In couch db situation is different. In case of relation db you have application server with custom logic and validations, couchdb from other side is accessible from outsize.

My idea that it’s very hard to create safe couchdb based system and most recommendations limited to setup nginx proxy and authenticate users which is not enough.

link

Volundr 2306 days ago

> It’s trivial to limit number of created documents in postgres, couchdb or application server though validation, I’m talking about updating document not creating new. In posgres if I update 1mb document used space will not always grow. In couch db situation is different. In case of relation db you have application server with custom logic and validations, couchdb from other side is accessible from outsize.

It is? It's unclear to me why I'm allowing 10 updates to a (largish, 10MB! Use a file or store it in S3!) document per-minute, but not 10 creates. Maybe I'm building Google Docs? Except I'd want old revisions, so those are creates. Plus 10 Mb is a huge spreadsheet. But sure lets roll with it. Actually Couch does not keep old versions of documents around, only old revision numbers. When a document is updated, the old version becomes eligible for compaction (basically garbage collection). So your attacker has to be fast enough to outrun the compactor, while being slow enough to not get temporarily banned from your service. It seems like less effort to me to use this power to flood your network I/O, which is almost certainly lower than your disk I/O. Or just choke your Postgres server on it's 100Mb/s disk I/O for updates + whatever is required to maintain your indexes.

I'm not actually advocating for Couch over Postgres. In my mind Postgres should be the default choice, and you switch to something else if you have a reason. For Couch, the biggest reason is sync is built in, in such a way that you can leverage it for your own applications with minimal effort. In my experience sync can be devilishly hard for non-trivial cases, so depending on your app, that can be pretty compelling.

But so far you seem to be focused on DOS attacks your not going to find separate advice for Postgres vs Mongo vs Couch, because the backing system doesn't matter. The attacks and mitigations are identical no matter the back-end, namely stop the traffic before it consumes your resources.

link

yatsyk 2306 days ago

Couch is not equivalent to mongo or relational because it accessible to clients if we want synchronisation. Securing app server is manageable problem and there is huge number of resources how to do it correctly.

In case of couch I've not seen any secure open-source example.

I'm not focused on DOS attacks, I'm just proposing different attack vectors.

link

newfeatureok 2306 days ago

Is it trivial? Let’s say you have a back end and an app that lets you post comments, like this site. How do you stop someone from spamming comments? Each comment is represented by a row in a table so the space will grow.

link

yatsyk 2306 days ago

If you need to limit the number of items it is trivial. You need to write something like `has_many :things, :before_add => :limit_things` in app server or create constraint in sql.

Spam prevention is not trivial but mostly solved problem. You can find a lot of articles about this topic.

But creating secure couchdb looks like very non-trivial.

link

Volundr 2306 days ago

Yeah... that's a Rails callback, not an SQL constraint, and can't be relied upon in the face of multiple simultaneous requests. Which kind of demonstrates my point. With a custom API, you have to understand your system, it's requirements, and it's limitations. You can't just read a blog post on "securing your webapp" and assume it's good.

Couch is no different. You have to understand Couch, you have to understand it's features and limitations, and build your system within those constraints.

You seem to be asserting because Couch is designed to be internet connected it can't be secure. If that's true, then I guess every customer on IBM Cloudant (Couch as a service), Realm (another database designed for mobile sync), and Firebase (Google database as a service) are all in trouble and just don't know it yet.

Security for all systems is non trivial. Thinking it is assures your system is not secure.

link

yatsyk 2305 days ago

I'm not asserting that couch is unsecure, I need such database but the problem that I can't see any resource that could help me design secure production system.

You can check even trivial rails blog or todo example from some book and it will be limited in scope but more or less secure. I'm having hard time to find secure couchdb example.

> Security for all systems is non trivial. But not equally hard.

If you use firebase you should understand that you getting vendor lock-in and in some cases you can spend much more money, but for some types of projects this platform is ok for me.

Same with couchdb, I understand that if I get replication with client, I need to pay by reorganising data or may be spend more resources to make system secure. There is no free lunch.

link

newfeatureok 2306 days ago

You are comparing apples to oranges. Again, are you imposing a hard constraint on the number of comments someone can make?

With the example o gave you could have a constraint in CouchDB achieve the effect but there are simply other examples one could use.

link

yatsyk 2305 days ago

No I'm not assuming constraint on the number of comments. First example shows how easy limit number created objects. Spam prevention is other topic not so trivial but mostly solved problem.

link