Hacker News new | ask | show | jobs
by compressedgas 1894 days ago
That's what Re-Pair does. It repeatedly replaces the most frequent substring with a symbol until the input has no repeating substrings.

See https://en.wikipedia.org/wiki/Grammar-based_code

1 comments

I had to do this by hand in the 1980s when I ran out of space in the ROM of the embedded controller I was working on for messages to be printed. There was no way to increase the size of ROM (2716 EPROM) at that stage in the development. The text was all 7 bit ASCII so I used the MSB as a flag to say that the remaining bits were a pointer into a table of frequently used fragments. This saved enough space to include the decompressor; it was recursive too so the fragments could also be compressed.