|
|
|
|
|
by nvader
841 days ago
|
|
I think what of is getting at is that given {the:1, t: 2, h:3, e:4} There should be somewhere in the corpus, "the is spelled t h e" that this system can use to pull this out. We can ask gpt to spell out individual words in NATO phonetic and see how it does. |
|
Such an approach would require an enormous table, containing all written words, including first and last names, and would still fail for made up words.
A more tractable approach would be to give it the map between the individual tokens and their letter component, but then you have the problem that this matching depends on the specific encoding used by the model (it varies between models). You could give it to the model during fine-tuning though.