Hacker News new | ask | show | jobs
by sva_ 1046 days ago
The tokens are in this case actually the individual characters:

    vocab = sorted(list(set(lines)))