Hacker News new | ask | show | jobs
by jraph 1022 days ago
At this point you might as well use the -o option (-o file.htm). It's easier and easier to understand.

I'd prefer wget to be a bit more clever when handling URLs query strings though, but I guess changing this behavior now might break some scripts.

2 comments

The -O option, not the -o option. The capital O sets the output file, while the small o in your comment sets the log filename.
Yep, thanks for the correction. I meant big -O, I don't know how I ended up writing small -o.
well, depends on the usecase. sometimes you want the whole url, like when i want to mirror a site and it has stuff like foo.html?page=1 foo.html?page=2 ...

wget does have options to use the name proposed by the server, and so another option to remove the query arguments would be useful, and in line with those.

A new option to strip query parameters from the output filename would be interesting. But its not so simple. When combined with recursion, one will often see a lot of pages with the same name but different query parameters. How should they be stored on disk? There's a couple of different issues I can think of.

However, if the potential issues can be resolved with sane defaults, I think this would be a great new switch to add.

yes, exactly. i think that the option would have to be ignored when doing recursion. or alternatively use the .1 .2 ... method like with all cases where a file of that name already exists.