| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pmarreck 1131 days ago

Basically,

`zstd --train <path/to/directory/of/many/small/example/files/>`

will output a dictionary file, and then the `-D <path/to/dictionary/file>` option when used for either compression or decompression will then use that dictionary first.

You can also investigate "man zstd" or google "zstd --train" for more details. The directory for the training must consist of many small files each of which is an example artifact; if you want to split, say, a single log file into files of each line, you can use, say, a bash script like this (note that I just created this with ChatGPT and eyeballed it, it looks correct but I haven't run it yet!): https://gist.github.com/pmarreck/91124e761e45d6860834eb046d6... (Also, don't forget to set it as executable with `chmod +x split_file.bash` before you try to run it directly)

1 comments

muragekibicho 1131 days ago

Thank you so much. I was trying to create a dictionary last night and your comment was sent by God. You're doing the Lord's work frfr! I followed you on GitHub!

link

pmarreck 1131 days ago

remember that if you don't understand a particular line of code, you can have chatgpt explain it... have fun

link