Hacker News new | ask | show | jobs
by ChuckMcM 3278 days ago
Very nice. Content addressable storage has a number of wonderful properties. At Blekko we would hash 'keys' (like a URI) which would identify a 'bucket' where that URI was stored. This spread crawling the web evenly across multiple servers.

At Netapp I worked for a bit on a content addressable version of a filer where each 4K block was hashed and the hash became the block address. Unlike Ugarit the block hashes were in an SSD based metadata server rather than being hashed into directories. The feature that fell out of this was you got content deduplication for 'free' since any block that hashed to a particular code you already had stored you didn't need to store again. (and this exploited the fixed length defense against hash collisions).

1 comments

The fact that the majority of infrastructure in tech is oblivious to something as obvious and beneficial as content-addressable storage is one of the most depressing things about the industry to me.