Hacker News new | ask | show | jobs
by quasque 4825 days ago
It's a good idea, but downloading something from the same URL (or filename fragment from an URL) does not necessarily imply that it's a duplicate. Would be interesting to see something similar to this extension that uses attributes sent at the start of an HTTP response - e.g. entity tag or content length - for duplicate checking.
1 comments

I was thinking of also checking for the same file size. Would that improve the accuracy of detecting duplicates?
It would, but you'd have to first download the whole file to check which is probably counter productive. Consider storing some http header values such as a reference to the Etag header if provided, and checking that value before saving twice. Also consider storing Cache-Control headers, and Expires headers. The Content-Length header would be a way to detect same file size before transferring the whole file, but for large file downloads which used chunked encoding.

Some resources:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

http://en.wikipedia.org/wiki/Chunked_transfer_encoding

http://en.wikipedia.org/wiki/HTTP_ETag

I'm glad you're addressing this problem! I always end up with file(1).zip in my downloads folder.