|
|
|
|
|
by yorwba
848 days ago
|
|
> Replacing common sentences with simple strings This is what byte-pair encoding does. It doesn't go quite so far as to allocate only a single token to "Once upon a time", because that string isn't actually that common, but in principle it could. Trying to get humans to produce content directly in such a concise representation is a waste of time, since LLMs heavily rely on the ability to take whatever content is already available on the internet, which drastically reduces the labor cost of acquiring training data. |
|