Hacker News new | ask | show | jobs
by andreygrehov 240 days ago
I don’t know what I’m talking about (pure fantasy), but what if you train a model on compressed data and then perform inference on compressed data as well? Could this work? With the output also being compressed and then decompressed by the client?
2 comments

The tokenizer is already a form of (somewhat lossy) compression of a string of plaintext to a stream of token identifiers. You can reason about Tokenizers/"embedding spaces" as a sort of massive "Dictionary Table/Dictionary Function" like you might use in a zip/gzip stream.

Starting with already compressed data doesn't necessarily mean fewer tokens, you can probably assume similar entropy (or probably worse entropy) in expanding "Dictionary words" in a compressed stream versus "tokens" from a plaintext stream.

Since all input is run through a tokenizer, I would expect the tokenizer space doesn't change a lot between one trained on uncompressed vs one trained on compressed data.