Hacker News new | ask | show | jobs
by markusbk 2945 days ago
If your input is already sorted (like this article assumes), you can use "sort -m", which is a lot faster. Also, to print only lines with duplicates, use "uniq -d" instead of "uniq -c | grep 2\ ".

Union: Instead of

    cat a_list b_list | sort | uniq
do

    sort -m a_list b_list | uniq
Intersection: Instead of

    cat a_list b_list | sort | uniq -c | grep 2\ 
do

    sort -m a_list b_list | uniq -d
Relative complement: Instead of

    cat a_list b_list b_list | sort | uniq -c | grep 2\ 
do

    sort -m a_list a_list b_list | uniq -u
Note the change of approach here: instead of making lines from b_list appear twice and grepping for that count, make lines from a_list appear twice and have uniq only print lines that aren't repeated.