| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sdkgames 911 days ago

>The program is smaller than even the 'compressed' form of its output,

>and thus represents a new departure in text compression standards.

7z and gzip disagree with this statement.

7 comments

o11c 911 days ago

It's only "compressed" if it was made by compress(1) in France. Otherwise it's just sparkling entropy coding.

For reference:

    1490 xmas-with-leading-comment.c
     913 xmas-without-leading-comment.c
    2357 xmas.out
    1297 xmas.out.9.Z
    1038 xmas.out.10.Z # actually better than with more bits!
    1048 xmas.out.11.Z # compression with 11..16 bits have the same size
     319 xmas.out.1.gz # compression levels 1..2 has same size
     317 xmas.out.3.gz
     307 xmas.out.4.gz # compression levels 4..9 have same size

Note that despite looking hard I haven't found a version of `compress` that supports `-H`, which is referenced and decompressable by gzip. I'm not sure how common it was in the wild.

link

arp242 911 days ago

  % gcc xmas.c
  xmas.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
      2 | main(t,_,a)
        | ^~~~
  xmas.c: In function 'main':
  xmas.c:2:1: warning: type of 't' defaults to 'int' [-Wimplicit-int]
  xmas.c:2:1: warning: type of '_' defaults to 'int' [-Wimplicit-int]


  % wc -c <xmas.c
  913
  % ./a.out | wc -c
  2359
  % ./a.out | compress | wc -c
  1048

link

jrockway 911 days ago

zstd -19 compresses the text of the song to 309 bytes.

To be a fair comparison, though, you'd have to write zstd -d in 604 bytes. I suppose to be REALLY fair, though, you have to count the bytes of code in the compiler itself. A convenient enough implementation of compression could index into the C compiler binary to find the bytes it needs. (For example, my GCC contains "first", "second", and "third" in the binary, which a sufficiently clever implementation could make use of. "Exit on the first error occurred.", "Append a second underscore if the name already contains an underscore.", "Warn about suspicious calls to memset where the third argument is constant literal zero and the second is not.", etc. I didn't check but I doubt turtle doves or maids-a-milking come up that often in the description of warning flags.)

link

arp242 911 days ago

zstd didn't exist in 1988.

link

jrockway 911 days ago

This article was posted to HN today, in 2023!

link

arp242 911 days ago

And that comment was clearly written in 1988. I fail to see what's so hard to understand about that and why people feel the need to "prove" it's "wrong".

link

jrockway 909 days ago

I don't think people are trying to prove it's wrong. Most of the comments are saying that general purpose technology in 2023 doesn't reduce the file size by as much as the manual job in 1988 did.

link

theideaofcoffee 911 days ago

This particular entry to the IOCCC was for 1988, those two algorithms didn’t come about for a few more years after that, gzip in the early nineties and 7z later that decade. The note is probably correct when comparing against the state of the art at the time.

link

acqq 911 days ago

You are right. Here is the source of "compress" that existed at that time. It compresses the produced song to 1048 bytes, not less.

https://www.nic.funet.fi/index/minix/compress.c

The program that produces the song, without the introductory comment, is 913 bytes, as presented in the article. Removing whitespaces it uses just 800 bytes and produces the song which is 2359 chars here. The whole C is:

    main(t,_,a)char*a;{return!0<t?t<3?main(-79,-13,a+main(-87,1-_,
    main(-86,0,a+1)+a)):1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==
    2?_<13?main(2,_+1,"%s%d%d\n"):9:16:t<0?t<-72?main(_,t,
    "@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+,/+#n+,/#;\
    #q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l q#'+d'K#!/+k#;\
    q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# ){nl]!/n{n#'; \
    r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#\
    \
    n'wk nw' iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c ;;\
    {nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;\
    #'rdq#w! nr'/ ') }+}{rl#'{n' ')# }'+}##(!!/")
    :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1):
    0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a,
    "!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}

It compiles and links even without #include.

link

sdkgames 911 days ago

gzip is based on the DEFLATE algorithm, which is a combination of LZ77(1977) and Huffman coding(1952).( https://en.wikipedia.org/wiki/Gzip )

link

1986 911 days ago

DEFLATE was created / implemented / speced / whatever word you want to apply by Phil Katz no earlier than 1989

link

arp242 911 days ago

"Based on" is not the same as "identical".

link

natch 911 days ago

Since the word 'compressed' is in quotes they are probably suggesting that they mean when processed by the 'compress' command as available on UNIX at the time, as opposed to some other compression available at the time.

link

ecesena 911 days ago

I just checked, gzip is from 1992, 7z from 1999.

link

edub 911 days ago

The program was written in 1988. I ran the text through LZSS which was published in 1982, so was available before 1988. I used a 1989 public domain version by Haruhiko Okumura, which is after 1988, but I don't believe it is optimized to improve upon the compression level of the 1982 algorithm.

It took it from 2357 bytes to 534 bytes, which is smaller than the Xmas.c program which I counted as 917 bytes, but another poster counted 913 bytes.

link

adrianmonk 911 days ago

"New departure" simply doesn't mean "record-breaking compression ratio".

link