Hacker News new | ask | show | jobs
by xearl 3198 days ago
nice collection! upvoted for "15. `namei -l`" alone, which is far too little known.

a better "20. Randomize lines in file":

  shuf file.txt
instead of

  cat file.txt | sort -R
(sort -R sorts by hash, which is not really randomisation.)
4 comments

> (sort -R sorts by hash, which is not really randomisation.)

I looked at the source code for GNU sort and what they're doing is reading 16 bytes from the system CSPRNG and then initializing an MD5 digest object with those 16 bytes of input. Then the input lines are sorted according to the hash of each line with the 16 bytes prepended.

Although they should no longer use MD5 for this, I don't think we know anything about the structure of MD5 that would even allow an adversary to have any advantage above chance in distinguishing between output created this way and an output created via a different randomization method. (Edit: or distinguishing between the distribution of output created this way and the distribution of output created via another method!)

The output of sort -R is different on each output and ordinarily covers the whole range of possible permutations.

  $ for i in $(seq 10000); do seq 6 | sort -R | sha256sum; done | sort -u | wc -l
  720
Sorting by hash will always cluster lines with equal content together, whereas true randomisation won't.

Eg `(seq 3; seq 3; seq 3) | sort -R`.

> namei -l /path/to/file

This is fantastic, a game changer for me.. I often give people a command like ls -lad /path /path/to /path/to/file

Thanks!

And prefer redirection to cat.

    whatever < file.txt 
not

    cat file.txt | whatever

Also, there is no need for - or z on GNU tar in t or x modes:

    tar tf whatever.tgz
    tar tf whatever.tar.bz2
    tar tf whatever.txz
    ...

    tar xf whatever.tar.gz
    tar xf whatever.tar.Z
    ...
all work just fine
I know whatever < file.txt is slightly more efficient, but there is value is keeping things going left to right with pipes in between. It makes it easy to insert a grep or sort, or swap out a part.

    <file.txt grep foo | sort
Ok, if that actually works, that's amazing. Wow.
It works because of the general principle that I mentioned here:

https://news.ycombinator.com/item?id=15249370

The shell (not the command) is the one expanding those metacharacters, so (within limits), in:

cmd < file

or

< file cmd

where you put that piece (< file) on the line does not matter, because the actual command cmd never sees the redirection at all - it is done [1] by the time the command starts. The cmd command will only see the command line arguments (other than redirections) and options (arguments that start with a dash). Even expansion of $ and other sigils (unless quoted) happens before the command starts.

[1] I.e. the shell and kernel set up the standard input of the command to be connected to the file opened for reading (in the case of < file).

Oh cool. Didn't know this was possible.
I prefer cat because > vs < is an easy typo to make but one clobbers the file. Happy to pay the performance penalty as insurance against that.
You can use the bash setting called noclobber to prevent such accidental deletion.

https://en.wikipedia.org/wiki/Clobbering

Nice tip! I updated the article.
Caching problems? :P