| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vlovich123 239 days ago
	Why does this test against sort \| uniq \| sort? It’s kind of weird to sort twice no?

4 comments

gucci-on-fleek 239 days ago

The first "sort" sorts the input lines lexicographically (which is required for "uniq" to work); the second "sort" sorts the output of "uniq" numerically (so that lines are ordered from most-frequent to least-frequent):

  $ echo c a b c | tr ' ' '\n'
  c
  a
  b
  c
  
  $ echo c a b c | tr ' ' '\n' | sort
  a
  b
  c
  c
  
  $ echo c a b c | tr ' ' '\n' | sort | uniq -c
        1 a
        1 b
        2 c
  
  $ echo c a b c | tr ' ' '\n' | sort | uniq -c | sort -rn
        2 c
        1 b
        1 a

link

happysadpanda2 238 days ago

`uniq -c` introduces a "count" at the beginning of the line, so what we are then sorting is on frequency of the unique terms in the output, not sorting the unique terms again (which indeed would be kindof nonsensical)

link

Aaron2222 239 days ago

  sort | uniq -c | sort -n

The second sort is sorting by frequency (the count output by `uniq -c`).

link

emmelaich 238 days ago

I often add `head` with `sort -rn` because I'm only interested in the largest.

link

BuildTheRobots 239 days ago

It's something I've done myself in the past. First sort is because it needs to be sorted for uniq -c to count it proper, second sort because uniq doesn't always give the output in the right order.

link

evertedsphere 239 days ago

more precisely, uniq produces output in the same order as the input to it, just collapsing runs / run-length encoding it

link