| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JyrkiAlakuijala 3354 days ago

> https://news.ycombinator.com/item?id=12003131

This is some sort of misunderstanding. If one replaces the static dictionary with zeros, one can easily benchmark brotli without the static dictionary. If one actually benchmarks it, one can learn the two things:

1) With the short (~50 kB) documents there is about an 7 % saving because of the static dictionary. There is still a 14 % win over gzip.

2) There is no compression density advantage for long documents (1+ MB).

Brotli's savings come to a large degree from algorithmic improvements, not from the static dictionary.

> https://news.ycombinator.com/item?id=12010313

The transformations make the dictionary a small bit more efficient without increasing the size of the dictionary. Think that out of the 7 % savings that the dictionary brings, about 1.5 % units (~20 %) are because of the transformations. However, the dictionary is 120 kB and the transformations less than 1 kB. So, transformations are more cost efficient than basic form of the dictionary.

> https://news.ycombinator.com/item?id=12400379

Brotli's dictionary was generated with a process that leads to the largest gain in entropy, i.e., every term and their ordering was chosen for the smallest size -- considering how many bits it would have costs to express those terms using other features of brotli. Even if results looks disgusting or difficult to understand, the process to generate it was quite delicate.

The same for transforms, but there it was mostly the ordering that we iterated with and generated candidate transforms using a large variety of tools.