| As a person who runs a lot of ETL-like commands at work, I never find myself using pv(1). I love the idea of it, but for the commands I most want to measure progress of, they always seem to be either: 1. things where I'd be paranoid about pv(1) itself becoming the bottleneck in the pipeline — e.g. dd(1) of large disks where I've explicitly set a large blocksize and set conv=idirect/odirect, to optimize throughput. 2. things where the program has some useful cleverness I rely on that requires being fed by a named file argument, but behaves a lot less intelligently when being fed from stdin — e.g. feeding SQL files into psql(1). 3. things where the program, even while writing to stdout, also produces useful "sampled progress" informational messages on stderr, which I'd like to see; where pv(1) and this output logging would fight each-other if both were running. 4. things where there's no clean place to insert pv(1) anyway — mostly, this comes up for any command that manages jobs itself in order to do things in parallel, e.g. any object-storage-client mass-copy, or any parallel-rsync script. (You'd think these programs would also report global progress, but they usually don't!) I could see pv(1) being fixed to address case 3 (by e.g. drawing progress while streaming stderr-logged output below it, using a TUI); but the other cases seem to be fundamental limitations. Personally, when I want to observe progress on some sort of operation that's creating files (rsync, tar/untar, etc), here's what I do instead: I run the command-line, and then, in a separate terminal connected to the machine the files are being written/unpacked onto, I run this: # for files
watch -n 2 -- ls -lh $filepath
# for directories
watch -n 4 -- du -h -d 0 $dirpath
If I'm in a tmux(1) session, I usually run the file-copying command in one pane, and then create a little three-vertical-line pane below it to run the observation command.Doing things this way doesn't give you a percentage progress, but I find that with most operations I already know what the target's goal size is going to be, so all I really need to know is the size-so-far. (And pv(1) can't tell you the target size in many cases anyway.) |
1) this gets it out of the pipeline. 2) the program gets to have the named arguments. 3) pv's out put is on a separate terminal. 4) your job never needs to know.
Downside: it only sees the currently open files, so it doesn't work well for batch jobs. Still, it's handy to see which file it's on, and how fast the progress is.
Also, for rsync: "--info=progress2 --no-i-r" will show you the progress for a whole job.