Hacker News new | ask | show | jobs
by carussell 3450 days ago
You don't have to wait for updates to newer versions to get dd to report its progress. The status line that dd prints when it finishes can also be forced at any point during dd's operation by sending the USR1 or INFO signals to the process. E.g.:

    ps a | grep "\<dd"
    # [...]
    kill -USR1 $YOUR_DD_PID
or

    pkill -USR1 ^dd
It also doesn't require you to get everything nailed down at the beginning. You've just spent the last 20 seconds waiting and realize you want a status update, but you didn't think to specify the option ahead of time? No problem.

I've thought that dd's behavior could serve as a model for a new standard of interaction. Persistent progress indicators are known to cause performance degradation unless implemented carefully. And reality is, you generally don't need something to constantly report its progress even while you're not looking, anyway.

To figure out the ideal interaction, try modeling it after the conversation you'd have if you were talking to a person instead of your shell:

"Hey, how much longer is it going to take to flash that image?"

The way dd works is close to this scenario.

4 comments

Yes this is true. Note BSD supports this better with Ctrl-T to generate SIGINFO, which one can send to any command even if not supported, in which case it's ignored. Using kill on linux, and having that kill processes by default is decidedly more awkward.

It's also worth noting the separate "progress" project which can be used to give the progress of running file based utilities.

We generally have pushed back on adding progress to each of the coreutils for these reasons, but the low overhead of implementation and high overlap with existing options was deemed enough to warrant adding this to dd

Also a gotcha on BSD (FBSD at least): SIGUSR1 kills dd.
Ctrl-T for SIGINFO is pretty useful, it would be good if Linux could pick this up.
Been waiting for years, frankly. I guess if I was motivated enough and had the time, I could do the research and submit patches..
They don't want it. Even if you got the patches accepted to the kernel they'd never accept patches to GNU Coreutils to support it.
The most recent comment I've seen on this is lukewarm at best: http://lkml.iu.edu/hypermail/linux/kernel/1411.0/03374.html
> in which case it's ignored

Ignored by the application, sure. But FreeBSD always prints useful stuff like load, current command, its pid and state:

$ dd if=/dev/random of=/dev/null

load: 0.72 cmd: dd 5820 [running] 0.70r 0.02u 0.68s 6% 2008k

263276+0 records in

263276+0 records out

134797312 bytes transferred in 0.708372 secs (190291809 bytes/sec)

(here, the "load:..." line is from the system, and the other 3 lines are from dd)

Yes! I'm thinking of building something like this for my neural net training (1-2 days on AWS, 16 GPUs/processes on the job). In this case the "state" that I'd like to access is all the parameters of the model and training history, so I'm thinking I'll probably store an mmapped file so I can use other processes to poke at it while it's running. That way I can decouple the write-test-debug loops for the training code and the viz code.
> I'm thinking I'll probably store an mmapped file so I can use other processes to poke at it while it's running.

That seems to run substantial risk of seeing it in an inconsistent state, yeah?

I generally use a semaphore when I'm reading and writing from my shm'd things. The data structure will also likely be append-only for the training process, as I want to see how things are changing over time.

Also I meant shm'd, not mmap'd.

I am knew to the shared memory concept. I am familiar with named pipes. Could you please elaborate a bit, I'm curios.

Are you passing the reference to an mmap adress, or using the shm systemcalls? In what language are you programming in? Does race conditions endanger the shared memory? If so, how does using semaphores help?

Sorry, if I asked a lot of questions, feel free to answer any/none of them :)

Sure! SHM is really cool, I just found out about it. It's an old Posix functionality, so people should use it more!

I'm using shm system calls in Python. Basically I get a buffer of raw bytes of a fixed size that is referred to by a key. When I have multiple processes running I just have to pass that key between them and they get access to that buffer of bytes to read and write.

On each iteration first I wait until the semaphore is free and then I lock it (P). That prevents anyone else from accessing the shared memory. I have the process read from the shared memory a set of variables - I have little helper functions that serialize and deserialize numpy arrays into raw bytes using fixed shapes and dtypes. Those arrays are then updated using some function combining the output of the process and the current value of the array. Then those arrays are reserialized and written back to the shm buffer as raw bytes again. Finally, the process releases the semaphore using P() so other processes can access it. The purpose of the semaphore is to prevent reading the arrays while another process is writing them - otherwise you might get interleaved old and new data from a given update. In a process-wise sense there is a race-condition, as each process can update at different times or in a different order, but for my purposes this is acceptable since neural net training is a stochastic sort of thing and it shouldn't care too much.

[0] http://nikitathespider.com/python/shm/ - original library which works fine for me

[1] http://semanchuk.com/philip/PythonIpc/ - updated version

>I've thought that dd's behavior could serve as a model for a new standard of interaction. Persistent progress indicators are known to cause performance degradation unless implemented carefully. And reality is, you generally don't need something to constantly report its progress even while you're not looking, anyway.

Progress bars by default are also garbage if you are scripting and want to just log results. ffmpeg is terrible for this.

> Persistent progress indicators are known to cause performance degradation unless implemented carefully.

Are you referring to that npm progress bar thing a few months back? I'm pretty sure the reason for that can be summed up as "javascript, and web developers".

Anyway, he's not proposing progress bars by default, he's proposing a method by which you can query a process to see how far it's come. I think there's even a key combination to do this on FreeBSD.

Or, for example, you could write a small program that sends a USR1 signal every 5 seconds, splitting out the responsibility of managing a progress bar:

% progress cp bigfile /tmp/

And then the 'progress' program would draw you a text progress bar, or even pop up an X window with a progress bar.

That's great! I think due to the way it's implemented it wouldn't be able to do progress reporting for e.g. "dd if=/dev/zero of=bigfile bs=1M count=2048", but that's a less common case than just cp'ing a big "regular" file.
Yes, C-t for SIGINFO, works on all BSDs (including macOS).
I've always used pv to get progress from dd, or other pipes:

  pv image.img | dd of=/dev/rdisk2 bs=1M
This adds another pipe though. I don't know the effect this has on performance.