Hacker News new | ask | show | jobs
by scatters 1022 days ago
`&unused=.htm`. It usually works.
2 comments

At this point you might as well use the -o option (-o file.htm). It's easier and easier to understand.

I'd prefer wget to be a bit more clever when handling URLs query strings though, but I guess changing this behavior now might break some scripts.

The -O option, not the -o option. The capital O sets the output file, while the small o in your comment sets the log filename.
Yep, thanks for the correction. I meant big -O, I don't know how I ended up writing small -o.
well, depends on the usecase. sometimes you want the whole url, like when i want to mirror a site and it has stuff like foo.html?page=1 foo.html?page=2 ...

wget does have options to use the name proposed by the server, and so another option to remove the query arguments would be useful, and in line with those.

A new option to strip query parameters from the output filename would be interesting. But its not so simple. When combined with recursion, one will often see a lot of pages with the same name but different query parameters. How should they be stored on disk? There's a couple of different issues I can think of.

However, if the potential issues can be resolved with sane defaults, I think this would be a great new switch to add.

yes, exactly. i think that the option would have to be ignored when doing recursion. or alternatively use the .1 .2 ... method like with all cases where a file of that name already exists.
So... you're adding more noise to the filename?

What?

It's a simple solution to give the file the right extension, and preserving query parameters can be the right thing to do if you hit the same path repeatedly e.g. for pagination.
> It's a simple solution to give the file the right extension,

Oh, I see now.

Do you work with many tools that can't work with files if they don't have the "right" extension? I thought that was mostly a Windows problem.