Hacker News new | ask | show | jobs
by pmahoney 2087 days ago
It's not a contradiction to state something is not a problem most of the time. (Though the next point, calculating checksums prior to build, is much more significant.)

Related example: on a cloud instance with SSD drive, I have scripts that generate a ~10GiB file, then immediately after (while it might still be in cache), calculating an md5sum still takes tens of seconds (maybe it wasn't in cache? I've not investigated deeply). It's just an example of a case that falls outside of that "Mostly" category.

1 comments

On my local system md5sum takes 1.9 user CPU-seconds per GB.

  $ time dd if=/dev/zero bs=1048576 count=1024 | md5sum
  1024+0 records in
  1024+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.104 s, 510 MB/s
  cd573cfaace07e7949bc0c46028904ff  -

  real    0m2.108s
  user    0m1.891s
  sys     0m0.628s
If you want to try measuring something more relevant, my redo implementation comes with a cubehash tool that uses the same hash as redo-ifchange et al. do.

* http://jdebp.uk./Softwares/redo/

Can you try cksum instead?
3.3s here. Is it faster on your system?
No, was hoping a non-cryptographic checksum would be faster. Perhaps cksum isn't the one.

Edit: "sum -s" is faster for me, but the man page doesn't give much info on what the algorithm is.

Well it returns 00000 for /dev/zero so it looks like it is a quite useless 16-bit check.