Hacker News new | ask | show | jobs
by svpg 709 days ago
It could hash the contents of a dir. Along the lines of git
2 comments

Except hashing requires... reading.

There is not much to be done here. Directories entries are just names, no guarantees that the files were not modified or replaced.

The best you could do is something similar to the strategies of rsync, rely on metadata (modified date, etc) and cross fingers nobody did `cp -a`.

I would be fine with the latter, the program could display a warning like "Results may be inaccurate, full scan required" or something.

I guess I'm just annoyed that for Windows/NTFS really fast programs are available but not for Linux filesystems.

And to hash something needs reading all of its data. I think deducing the file size would actually be faster in some file systems and never slower with any.
Faster in all file systems I'd guess, stat is fast, opening the file and reading its contents and updating a checksum is slow, and gets slower the larger the file is.