Hacker News new | ask | show | jobs
by linsomniac 1507 days ago
One more note:

"bzip3 -e -j 6 -b 50": 25 seconds, 125MB

So nearly as good as the best of xz, but in a 20th the time.

However: Do note that any unexpected use is met with a SIGSEGV: using as a filter, using "-j6" instead of "-j 6", not specifying "-e"...

1 comments

lies. not specifying -e displays an error message:

  % bzip3 -e -j 6 -b 50 corpus/calgary.tar
  % bzip3 -j 6 -b 50 corpus/calgary.tar
  bzip3 - A better and stronger spiritual successor to bzip2.
  Copyright (C) by Kamila Szewczyk, 2022. Licensed under the terms of GPLv3.
  Usage: bzip3 [-e/-d/-t/-c] [-b block_size] input output
  Operations:
    -e: encode
    -d: decode
    -t: test
  Extra flags:
    -c: force reading/writing from standard streams
    -b N: set block size in MiB
    -j N: set the amount of parallel threads
you can use bzip3 as a filter:

  % cat corpus/calgary.tar | bzip3 -b 10 -e -c | wc -c
  807959
and using "-j6" is simply being unable to read the help page.
The code has, at https://github.com/kspalaiologos/bzip3/blob/bf2f0e02fd59f4c4... :

            } else if (argv[i][1] == 'j') {
                workers = atoi(argv[i + 1]);
                i++;
If the last argument is "-j6" then this will read past the end of the allocated argv strings and try to do atoi(NULL):

  % ./bzip3 -j3 < README.md
  Segmentation fault
"-j6" is standard getopt() behavior, and the default expected behavior from Unix/POSIX systems.
They are not disputing that - they actually acknowledged it. Instead they are disputing that omitting -e will lead to a crash.
You're right, what I thought was encoding/filter SEGVs were the bug in handling "-j6".