Hacker News new | ask | show | jobs
by PaulHoule 644 days ago
The token number is the index for an embedding. If there are 50,000 distinct tokens then there are 50,000 different embedding that could be presented to the input of the neural networks. I suppose you could compress a list of tokens with Huffman or gzip or some similar algorithm but so far as the neural network one token is one slot of input to the network so you wouldn’t save anything once you got into the network.