Hacker News new | ask | show | jobs
by GuB-42 743 days ago
I would use a better compressor than gzip but I have done this trick several times.

xz or zstd may be better choices, or you can look at Hutter Prize [1] winners for best compression and therefore best entropy estimate.

[1] http://prize.hutter1.net/

1 comments

> best compression and therefore best entropy estimate

That's a good point. But the Hutter Prize is for compressing a 1 GB file. On inputs as short as a line of code, gzip doesn't do so badly. For a longer line:

  $ INPUT='    bool isRegPair() const { return kind() == RegisterPair || kind() == LateRegisterPair || kind() == SomeLateRegisterPair; }'
  $ echo "$INPUT" | gzip | wc -c
  95
  $ echo "$INPUT" | bzip2 | wc -c
  118
  $ echo "$INPUT" | xz -F xz | wc -c
  140
  $ echo "$INPUT" | xz -F lzma | wc -c
  97
  $ echo "$INPUT" | zstd | wc -c
  92
For a shorter line:

  $ INPUT='        ASSERT(regHi().isGPR());'
  $ echo "$INPUT" | gzip | wc -c
  48
  $ echo "$INPUT" | bzip2 | wc -c
  73
  $ echo "$INPUT" | xz -F xz | wc -c
  92
  $ echo "$INPUT" | xz -F lzma | wc -c
  51
  $ echo "$INPUT" | zstd | wc -c
  46