| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by markusbk 2945 days ago

If your input is already sorted (like this article assumes), you can use "sort -m", which is a lot faster. Also, to print only lines with duplicates, use "uniq -d" instead of "uniq -c | grep 2\ ".

Union: Instead of

    cat a_list b_list | sort | uniq

    sort -m a_list b_list | uniq

Intersection: Instead of

    cat a_list b_list | sort | uniq -c | grep 2\

    sort -m a_list b_list | uniq -d

Relative complement: Instead of

    cat a_list b_list b_list | sort | uniq -c | grep 2\

    sort -m a_list a_list b_list | uniq -u

Note the change of approach here: instead of making lines from b_list appear twice and grepping for that count, make lines from a_list appear twice and have uniq only print lines that aren't repeated.