Hacker News new | ask | show | jobs
by hot_gril 875 days ago
Yes, a file's hash is only based on its contents. The way I understand it, a file doesn't really live in a directory, it's more like a directory (which is a kind of file itself) references files. So the same file can be in two directories, yet it'll have the same URL/hash. And if you "add" files to a directory, you're really uploading a separate copy of the dir that'll have a different hash.

I checked myself on this, but someone else might want to check me cause I'm not an expert.

1 comments

This is generally true, though it’s possible to encode the same data into a slightly different shaped DAG to optimise for eg video streaming performance afaiu (balanced vs imbalanced). UnixFS vs raw bytes may also be different but I’m not 100%
From the fs's point of view, these are different file contents. But yeah, there's nothing stopping you from pinning something different that looks the same to a person.
Once decoded they would be the same file contents - imagine one DAG where the depth is log(n) and it’s a perfectly balance tree, and another where the depth of the left-hand branch is 1, right hand branch contains another subtle with left-hand size 1 etc etc etc.

The leaves are the same in both cases, so the file contents are the same, though the latter is quicker to stream (though not to verify) and the CIDs will be different