Hacker News new | ask | show | jobs
by niftich 3356 days ago
There are some design decisions in Brotli I just don't quite understand [1][2][3], like what's going on with its dictionary [2]. One of the Brotli authors is active in this thread, so perhaps they can talk about this.

Zstandard is pretty solid, but lacks deployment on general-purpose web browsers. Firefox and Edge have followed Google's lead and added or about to add support for Brotli. Both Brotli and Zstandard see usage in behind-the-scenes situations, on-the-wire in custom protocols, and the like.

As for widespread use on files-sitting-on-disk, on perhaps average people's computers, I think we're quite a few years and quite some time away from replacing containers and compressors that have been around for a long time, and are still being used because of compatibility and lack of pressure to switch to a non-backwards-compatible alternative [4].

[1] https://news.ycombinator.com/item?id=12010313 [2] https://news.ycombinator.com/item?id=12003131 [3] https://news.ycombinator.com/item?id=12400379 [4] https://news.ycombinator.com/item?id=13171374

1 comments

> https://news.ycombinator.com/item?id=12003131

This is some sort of misunderstanding. If one replaces the static dictionary with zeros, one can easily benchmark brotli without the static dictionary. If one actually benchmarks it, one can learn the two things:

1) With the short (~50 kB) documents there is about an 7 % saving because of the static dictionary. There is still a 14 % win over gzip.

2) There is no compression density advantage for long documents (1+ MB).

Brotli's savings come to a large degree from algorithmic improvements, not from the static dictionary.

> https://news.ycombinator.com/item?id=12010313

The transformations make the dictionary a small bit more efficient without increasing the size of the dictionary. Think that out of the 7 % savings that the dictionary brings, about 1.5 % units (~20 %) are because of the transformations. However, the dictionary is 120 kB and the transformations less than 1 kB. So, transformations are more cost efficient than basic form of the dictionary.

> https://news.ycombinator.com/item?id=12400379

Brotli's dictionary was generated with a process that leads to the largest gain in entropy, i.e., every term and their ordering was chosen for the smallest size -- considering how many bits it would have costs to express those terms using other features of brotli. Even if results looks disgusting or difficult to understand, the process to generate it was quite delicate.

The same for transforms, but there it was mostly the ordering that we iterated with and generated candidate transforms using a large variety of tools.