| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ComplexSystems 333 days ago
	Good stuff. You could get much better bandwidth than this by tokenizing and using something like a Huffman or arithmetic code on token frequencies. As a simple example, if you set your tokens to be all English words - let's say there are between 500k and 1 million - that's about 9-10 bits per word. I am sure you could do much better than this as well

2 comments

avidiax 333 days ago

You can get much better than that by taking a well-known LLM model and encoding a series of offsets from the most likely sequence of tokens, especially if you are OK with the message being slightly different.

https://arxiv.org/abs/2306.04050

https://bellard.org/ts_zip/

link

ashfn 333 days ago

That sounds very interesting, I'll look into it thanks :)

link