|
|
|
|
|
by rhdunn
1125 days ago
|
|
The problem with GPT and other LLMs is that they don't tokenize words at a word or morpheme level, it's just blocks of up to 4 characters, so you get tokens like `!"` instead of two separate tokens. -- That makes it harder to write custom tools on top of, unlike e.g. the output/model of things like the universaldependencies project. |
|