Usually just specifying a reasonable blocksize works for me. bs=1m or so.
Without that it does literally take hours.
I suspect the default blocksize is really small (1?) and combined with uncached/unbuffered writes to slower devices, it just kills all performance outright.
Per the sibling comments, you just need to specify a sane block size. dd's default is really low and if you experiment a bit with 2M or around that you'll get near-theoretical throughput.
NB: Remember the units! Without the units you specify it as bytes or something insanely small like that. I've made that mistake more than once!
Without that it does literally take hours.
I suspect the default blocksize is really small (1?) and combined with uncached/unbuffered writes to slower devices, it just kills all performance outright.
Edit: answered! https://news.ycombinator.com/item?id=13350002