Hacker News new | ask | show | jobs
by chasil 1287 days ago
A minor optimization is collapsing the grep -v, from this:

  cat file | grep -v THING1 | grep -v THING2 | grep -v THING3 | grep -v THING4
to this:

  egrep -v 'THING1|THING2|THING3|THING4' file
That gets rid of the cat and three greps. Both POSIX and GNU encourage grep -E to be used in preference to egrep.

A pcregrep utility also used to exist, if you want expansive perl-compatible regular expressions. This has been absorbed into GNU grep with the -P option.

4 comments

> A pcregrep utility also used to exist, if you want expansive perl-compatible regular expressions. This has been absorbed into GNU grep with the -P option.

'pcregrep' still exists. But with PCRE2 supplanting PCRE, it is now spelled 'pcre2grep'.

I don't know the precise history of 'grep -P' and whether 'pcregrep' was actually absorbed into it, but 'pcregrep' is its own thing with its own features. For example, it has a -M/--multiline flag that no standard grep (that I'm aware of) has. (Although there are some work-arounds, e.g., by treating NUL as the line terminator via the -z/--null-data flag in GNU grep.)

Oddly, there are pcre2 packages in RedHat/Alma 9, but they do not include a pcre2grep.

GNU grep is also linked to pcre, not pcre2.

  # pcre2grep
  bash: pcre2grep: command not found...

  # yum install pcre2grep
  Last metadata expiration check: 1:58:58 ago on Tue 13 Dec 2022 11:45:44 AM CST.
  No match for argument: pcre2grep
  Error: Unable to find a match: pcre2grep

  # yum whatprovides pcre2grep
  Last metadata expiration check: 2:09:25 ago on Tue 13 Dec 2022 11:45:44 AM CST.
  Error: No matches found.

  # rpm -qa | grep pcre2 | sort
  pcre2-10.40-2.0.2.el9.x86_64
  pcre2-syntax-10.40-2.0.2.el9.noarch
  pcre2-utf32-10.40-2.0.2.el9.x86_64

  # which grep
  /usr/bin/grep
  # ldd /usr/bin/grep | grep pcre
   libpcre.so.1 => /lib64/libpcre.so.1 (0x00007efc473c4000)
GNU grep recently migrated to PCRE2. My GNU grep is linked to PCRE2:

    $ grep --version | head -n2
    grep (GNU grep) 3.8
    Copyright (C) 2022 Free Software Foundation, Inc.
    $ ldd /usr/bin/grep
            linux-vdso.so.1 (0x00007ffd2ddd5000)
            libpcre2-8.so.0 => /usr/lib/libpcre2-8.so.0 (0x00007f4f88b81000)
            libc.so.6 => /usr/lib/libc.so.6 (0x00007f4f8899a000)
            /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f4f88c6f000)
As for pcre2grep, Archlinux includes it as part of the pcre2 package:

    $ pacman -Qo $(which pcre2grep)
    /usr/bin/pcre2grep is owned by pcre2 10.40-3
So this is a distro packaging thing. But what I said is true: pcregrep, pcre2grep and grep -P are all distinct things.
I once compared the speed of these two approaches, rather accidentally. I did output colorization by adding ANSII sequences. I thought, of course one process must be more efficient than a pipe of processes. After the rewrite, I was disappointed about the slowdown and reverted back to the pipe.

PS I checked back and I used sed rather than grep. I think the result would hold for grep but the morale is that you should verify rather than assume.

I have around 50 seds in the pipe, running in parallel (which is what makes it faster), it would have been a half of that when I tried the rewrite.

I usually prefer to pipe fgrep -v into fgrep -v. With egrep you need to escape brackets and other characters.
If you haven't seen it, there's been some noise recently about fgrep and its egrep friend

  $ fgrep -h
  fgrep: warning: fgrep is obsolescent; using grep -F
  $ egrep -h
  egrep: warning: egrep is obsolescent; using grep -E
True, for absolute fixed strings, the author's approach is superior.

Off the cuff, another way to do it is with awk's index function. I don't know what speed penalty this might impose.

  $ cat ~/fmgrep
  #!/bin/awk -f

  BEGIN { split(ARGV[1], s, ","); ARGV[1]="" }

  { m = 1; for(c in s) if(index($0, s[c])) m = 0; if(m) print }

  $ ~/fmgrep sshd,systemd,polkitd /var/log/secure
This also works:

fgrep -v -e THING1 -e THING2 -e THING3 -e THING4

Or the following if pipes or egrep make you nervous:

grep -v -e THING1 -e THING2 -e THING3 -e THING4 file