Hacker News new | ask | show | jobs
by daviesliu 845 days ago
JuiceFS relies on the object store to provide integrity for data. Besides that, JuiceFS stores the checksum of each object as tags in S3, and verifies that when downloading the objects.

Inside the metadata service, it uses merkle tree (hash of hash) to verify the integrity of whole namespace (including id of data blocks) between RAFT replicas. Once we store the hash (4 bytes) of each objects into metadata, it should provide the integrity of the whole namespace.

1 comments

Does JuiceFS allow the user to specify the hash of a file when uploaded? And then to read that hash back later?

Otherwise there’s no end-to-end integrity check.

The S3 API allow user to specify the hash of content as HTTP header, it will be verified by the JuiceFS gateway and persisted into JuiceFS as ETag.

With POSIX API or HDFS, there is no such API to do that, unfortunately.

surely you mean that the FS should calculate the hash on file creation/update, not take some random value from the user. but I agree that a FS that maintains file-content hash should allow clients to query it.
No, the FS should verify the hash on creation/update. Otherwise corruption during creation/update would just cause the hash to match the corrupted data.