Hacker News new | ask | show | jobs
by ggm 3283 days ago
This is awk which emits the stream of unique things, as they are seen. it doesn't require sorted input. It runs at the cost of building the obvious hash in memory so can drive you to swap over large inputs, but its portable, does not require post-install s/w typically not on small systems and it delivers outcomes fast.

I use it all the time when I have some UNIX pipe emitting things and I want to "see" the uniques before I do sort | uniq -c type things.

#!/bin/sh awk '{ if (!h[$0]) { print $0; h[$0]=1 } }'

1 comments

I do something similar with Perl, since I know the syntax a bit better. It allows me to (from memory) scrub out non-unique things like timestamps.

So, without the scrubbing:

tail -f somefile | perl -ne '!$SEEN{$_}++ and print'

Scrubbing off leading timestamps:

tail -f somefile | perl -ne 's/^[0-9:]//;!$SEEN{$_}++ and print'