| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by daviesliu 845 days ago
	JuiceFS relies on the object store to provide integrity for data. Besides that, JuiceFS stores the checksum of each object as tags in S3, and verifies that when downloading the objects. Inside the metadata service, it uses merkle tree (hash of hash) to verify the integrity of whole namespace (including id of data blocks) between RAFT replicas. Once we store the hash (4 bytes) of each objects into metadata, it should provide the integrity of the whole namespace.

1 comments

amluto 845 days ago

Does JuiceFS allow the user to specify the hash of a file when uploaded? And then to read that hash back later?

Otherwise there’s no end-to-end integrity check.

link

daviesliu 843 days ago

The S3 API allow user to specify the hash of content as HTTP header, it will be verified by the JuiceFS gateway and persisted into JuiceFS as ETag.

With POSIX API or HDFS, there is no such API to do that, unfortunately.

link

markhahn 845 days ago

surely you mean that the FS should calculate the hash on file creation/update, not take some random value from the user. but I agree that a FS that maintains file-content hash should allow clients to query it.

link

amluto 845 days ago

No, the FS should verify the hash on creation/update. Otherwise corruption during creation/update would just cause the hash to match the corrupted data.

link