Hacker News new | ask | show | jobs
by sdkgames 911 days ago
>The program is smaller than even the 'compressed' form of its output,

>and thus represents a new departure in text compression standards.

7z and gzip disagree with this statement.

7 comments

It's only "compressed" if it was made by compress(1) in France. Otherwise it's just sparkling entropy coding.

For reference:

    1490 xmas-with-leading-comment.c
     913 xmas-without-leading-comment.c
    2357 xmas.out
    1297 xmas.out.9.Z
    1038 xmas.out.10.Z # actually better than with more bits!
    1048 xmas.out.11.Z # compression with 11..16 bits have the same size
     319 xmas.out.1.gz # compression levels 1..2 has same size
     317 xmas.out.3.gz
     307 xmas.out.4.gz # compression levels 4..9 have same size
Note that despite looking hard I haven't found a version of `compress` that supports `-H`, which is referenced and decompressable by gzip. I'm not sure how common it was in the wild.

  % gcc xmas.c
  xmas.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
      2 | main(t,_,a)
        | ^~~~
  xmas.c: In function 'main':
  xmas.c:2:1: warning: type of 't' defaults to 'int' [-Wimplicit-int]
  xmas.c:2:1: warning: type of '_' defaults to 'int' [-Wimplicit-int]


  % wc -c <xmas.c
  913
  % ./a.out | wc -c
  2359
  % ./a.out | compress | wc -c
  1048
zstd -19 compresses the text of the song to 309 bytes.

To be a fair comparison, though, you'd have to write zstd -d in 604 bytes. I suppose to be REALLY fair, though, you have to count the bytes of code in the compiler itself. A convenient enough implementation of compression could index into the C compiler binary to find the bytes it needs. (For example, my GCC contains "first", "second", and "third" in the binary, which a sufficiently clever implementation could make use of. "Exit on the first error occurred.", "Append a second underscore if the name already contains an underscore.", "Warn about suspicious calls to memset where the third argument is constant literal zero and the second is not.", etc. I didn't check but I doubt turtle doves or maids-a-milking come up that often in the description of warning flags.)

zstd didn't exist in 1988.
This article was posted to HN today, in 2023!
And that comment was clearly written in 1988. I fail to see what's so hard to understand about that and why people feel the need to "prove" it's "wrong".
I don't think people are trying to prove it's wrong. Most of the comments are saying that general purpose technology in 2023 doesn't reduce the file size by as much as the manual job in 1988 did.
This particular entry to the IOCCC was for 1988, those two algorithms didn’t come about for a few more years after that, gzip in the early nineties and 7z later that decade. The note is probably correct when comparing against the state of the art at the time.
You are right. Here is the source of "compress" that existed at that time. It compresses the produced song to 1048 bytes, not less.

https://www.nic.funet.fi/index/minix/compress.c

The program that produces the song, without the introductory comment, is 913 bytes, as presented in the article. Removing whitespaces it uses just 800 bytes and produces the song which is 2359 chars here. The whole C is:

    main(t,_,a)char*a;{return!0<t?t<3?main(-79,-13,a+main(-87,1-_,
    main(-86,0,a+1)+a)):1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==
    2?_<13?main(2,_+1,"%s%d%d\n"):9:16:t<0?t<-72?main(_,t,
    "@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+,/+#n+,/#;\
    #q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l q#'+d'K#!/+k#;\
    q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# ){nl]!/n{n#'; \
    r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#\
    \
    n'wk nw' iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c ;;\
    {nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;\
    #'rdq#w! nr'/ ') }+}{rl#'{n' ')# }'+}##(!!/")
    :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1):
    0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a,
    "!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}
It compiles and links even without #include.
gzip is based on the DEFLATE algorithm, which is a combination of LZ77(1977) and Huffman coding(1952).( https://en.wikipedia.org/wiki/Gzip )
DEFLATE was created / implemented / speced / whatever word you want to apply by Phil Katz no earlier than 1989
"Based on" is not the same as "identical".
Since the word 'compressed' is in quotes they are probably suggesting that they mean when processed by the 'compress' command as available on UNIX at the time, as opposed to some other compression available at the time.
I just checked, gzip is from 1992, 7z from 1999.
The program was written in 1988. I ran the text through LZSS which was published in 1982, so was available before 1988. I used a 1989 public domain version by Haruhiko Okumura, which is after 1988, but I don't believe it is optimized to improve upon the compression level of the 1982 algorithm.

It took it from 2357 bytes to 534 bytes, which is smaller than the Xmas.c program which I counted as 917 bytes, but another poster counted 913 bytes.

"New departure" simply doesn't mean "record-breaking compression ratio".