|
|
|
|
|
by huac
1022 days ago
|
|
where is that reported? in Table 14 I see PSM performing much better than SPM. I also see a note about the SPM performance which attributes the degradation to the tokenizer edge cases > As an example, our model would complete the string 'enu' with 'emrate' instead of 'merate' which shows awareness of the logical situation of the code but incomplete understanding of how tokens map to character-level spelling. that doesn't really feel like a failure of language modeling to me |
|
> Note, however, that the results in random span infilling are significantly worse in suffix-prefix-middle (SPM) format than in prefix-suffix-middle (PSM) format as it would require token healing (Microsoft, 2023),