Hacker News new | ask | show | jobs
by nirvana 5404 days ago
It seems that the tags for cells seems to be an important feature of this database, and they also mention it is appropriate for places where "privacy is important". Can someone explain the connection between these two? If I'm understanding right, the labeling makes it easy to address individual cells, but I'm not sure how that enhances privacy.
3 comments

They seem to be access tags, not just arbitrary tags. I interpret it as item-level ACLs, something like row-level security in SQL databases.
The labeling likely refers to Mandatory Access Control (MAC) where objects (data, cells) are assigned classification labels (e.g. Top Secret, Confidential) and subjects (users, processes) can only access objects that match the subject's assigned classification level.
I would imagine that this is similar to other ACL products in which the NSA has previously expressed interest, like SELinux. The "labeling" probably means setting permission levels.
"There is a risk that Accumulo will be criticized for not providing adequate security. The access labels in Accumulo do not in themselves provide a complete security solution, but are a mechanism for labeling each piece of data with the authorizations that are necessary to see it."
I'm guessing that the idea is to make it easy to enforce permissions at the application layer. You give permissions, and you get only cells that the current query-er is allowed to see. With HBase, it would be pretty easy to put permissions by the row (add a permission column, or column family if it's complicated enough), but if you want some columns in a row to have some permissions and some to have different ones, it would get unpleasant and inefficient fast.

And regardless, all of the filtering would have to occur at the application layer, meaning you'd have to wrap every get/scan to have it do the filtering for you. The Accumulo way also gets you some efficiency because it never even has to transfer the cells that get filtered by the permissions (or even fully read their content from disk, possibly).

Even though each cell isn't separately encrypted to get you true security at the cell level (which would destroy your performance, I'd guess), this seems like a huge win if you want to have permissions at the cell level.