Hacker News new | ask | show | jobs
by llimllib 2475 days ago
Doing an ‘aws s3 sync ...’ on a directory with large files causes 100% CPU usage
1 comments

How would you compare hashes without calculating them? Any operating system more advanced than Windows 95 shouldn’t “lock up” with a CPU-bound task.
An extremely naive program can sha1 hash 1 million 100 byte strings on my computer in less than half a second: https://gist.github.com/llimllib/72f60aa33b32e422962d876ddf0...

This is literally the first program I came up with, no attempt to optimize it at all.

There is zero chance that the AWS sync command is filling my CPU just by hashing bytes

edit: I'm going to try not to let you nerd snipe me into doing the profiling the AWS CLI needs to be doing, for them. Because that's now what I desire to do.

so 200 megabytes/second? I'm not sure what your definition of large files is, but hashing anything sizable with SHA1 is trivially CPU-bound with any modern SSD, in the absence of a processor with the sha asm extensions.

that being said, quick glance at the source suggests that awscli's s3 sync only compares files by size & timestamp, not etag, so it's not hashing anything client-side.