|
|
|
|
|
by dchichkov
472 days ago
|
|
0 1
00 01 10 11
000 001 010 011 100 101 110 111
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 And no, I don't think the knowledge of language is necessary. To give a concrete example, tokens from TinyStories dataset (the dataset size is ~1GB) are known to be sufficient to bootstrap basic language. |
|