Hacker News new | ask | show | jobs
by icebraining 4488 days ago
It's interesting that we're still writing such applications by hand. One thing that interested me when I learned about CouchDB was the possibility of skipping that and just exposing the database to the browser, with a few schemas and a couple of data validation functions configured. After all, that system is almost a dumb HTTP storage mechanism.
1 comments

Presumably that means that users can mass-scrape the submitted comments though? (Potentially allowing the email addresses users submitted to be harvested.)

Other than that it doesn't seem like an unreasonable approach.

Why can't they mass scrape your service, though? After all, what you built is essentially a very specialized REST database.

As for harvesting email addresses, I think you could solve that by using a CouchDB view, which is essentially a function that processes and returns JSON documents. In this case, it could just delete the "email" key and return the rest.

You would probably still need to block the direct access to the document via frontend proxy, since I don't think Couch allows you to specify fine-grained per-user permissions, which is definitively a drawback.

Alternatively, since you're already willing to send hashed versions of the emails (as Gravatars), you could just store only the hashes in the first place, and never commit the plaintext to disk.

I might have been making assumptions on CouchDB which aren't valid - that remote users could query all documents (== pages) to get the comments.

With my thing yes it can be crawled, since requests to /comments/ID will return the JSON comment-data. However there is no enumeration of the valid IDs possible, short of a dictionary attack. (This is where I was thinking that exposing CouchDB might expose more data.)

I did consider not storing emails, and for my use-case that's fine, but I figured sooner or later somebody will want to access them so ruling it out unduly would eventually result in a bug report.

I might have been making assumptions on CouchDB which aren't valid - that remote users could query all documents (== pages) to get the comments.

Yes, you'd probably need to block that URL with a proxy, and only allow single page views to be requested. I think this is definitively a shortcoming of the BD; it should allow finer grained permissions.

However there is no enumeration of the valid IDs possible, short of a dictionary attack.

Well, by default CouchDB uses UUIDs, so enumeration shouldn't be possible either. Of course, both are subject to simple scraping of the HTML pages; a simple wget + grep can probably list them all, so you don't gain much, except for private pages you might have.

I did consider not storing emails, and for my use-case that's fine, but I figured sooner or later somebody will want to access them so ruling it out unduly would eventually result in a bug report.

Fair enough. I actually don't think CouchDB, as it is now, would necessarily be a better solution than yours. But the question is, why not? I believe the direction is correct, but the current implementation falls short, and that's a shame.