Hacker News new | ask | show | jobs
by jjoonathan 1599 days ago
It's weirdly difficult to get grep to search for fixed binary strings, with lots of gotchas if you don't understand grep internals. I still don't, but this is the best I have been able to do after knocking my forehead on three or four of said gotchas:

    LC_ALL=C grep -larP '\x1A\x2B\x3C\xFF'
1 comments

I presume -P is a hack so that grep does the job of decoding the escape sequences? It seems your struggle here is mostly with the shell, not grep; specifically, with the fact that normal shell syntax does not recognize escape sequences other than \$, \`, \", \\, and \<newline>. Try this, which uses printf to process escape sequences.

  grep -larF "$(printf '\032\053\074\377')"
The -F flag should also make this faster as it doesn't actually need to use a regular expression engine, let alone a Perl-compatible one.

Caveats:

1) POSIX only requires printf to recognize octal escapes (\nnn, or \0nnn if using %b specifier), not hexadecimal escapes. Many implementations recognize the latter, but not Debian dash.

2) Shell command substitution strips trailing new lines from the output, so if your binary string ends in a newline you'll need to use extra tricks. E.g. S="$(printf '\032\053\074\nX')"; grep -larF "${S%X}"

3) It's probably a good idea to still specify LC_ALL=C, but because the binary string is now being passed through the shell's innards it might need to be set in the environment of the shell itself, not simply the environments of the printf and grep subcommands. (Also, technically I'm not sure if the C/POSIX locale is required to be 8-bit clean, yet, but in practice it will be.)

Bash and some other shells support an extension ($') for expanding escape sequences inline:

  grep -larF $'\x1A\x2B\x3C\xFF'
If you do any amount of shell programming--even if you only stick with Bash--it's worth spending 30 minutes reading the "Shell Command Language" chapter of the POSIX specification: https://pubs.opengroup.org/onlinepubs/9699919799/ The first few sections are the most concise resource available for explaining, step-by-step, shell parsing rules.