|
|
|
|
|
by karpathy
775 days ago
|
|
The paper mentions some reasons why these quick fix ideas are not as simple as it sounds. For example many rare tokens are “intermediate” merges inside the BPE algorithm, shorter prefixes of longer words. The long word is common, but its earlier, intermediate merge is not, by itself. |
|