| HN Mirror

Basically,

`zstd --train <path/to/directory/of/many/small/example/files/>`

will output a dictionary file, and then the `-D <path/to/dictionary/file>` option when used for either compression or decompression will then use that dictionary first.

You can also investigate "man zstd" or google "zstd --train" for more details. The directory for the training must consist of many small files each of which is an example artifact; if you want to split, say, a single log file into files of each line, you can use, say, a bash script like this (note that I just created this with ChatGPT and eyeballed it, it looks correct but I haven't run it yet!): https://gist.github.com/pmarreck/91124e761e45d6860834eb046d6... (Also, don't forget to set it as executable with `chmod +x split_file.bash` before you try to run it directly)