Hacker News new | ask | show | jobs
by yefim323 4821 days ago
I was thinking of also checking for the same file size. Would that improve the accuracy of detecting duplicates?
1 comments

It would, but you'd have to first download the whole file to check which is probably counter productive. Consider storing some http header values such as a reference to the Etag header if provided, and checking that value before saving twice. Also consider storing Cache-Control headers, and Expires headers. The Content-Length header would be a way to detect same file size before transferring the whole file, but for large file downloads which used chunked encoding.

Some resources:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

http://en.wikipedia.org/wiki/Chunked_transfer_encoding

http://en.wikipedia.org/wiki/HTTP_ETag

I'm glad you're addressing this problem! I always end up with file(1).zip in my downloads folder.