Hacker News new | ask | show | jobs
by huac 1022 days ago
where is that reported? in Table 14 I see PSM performing much better than SPM. I also see a note about the SPM performance which attributes the degradation to the tokenizer edge cases

> As an example, our model would complete the string 'enu' with 'emrate' instead of 'merate' which shows awareness of the logical situation of the code but incomplete understanding of how tokens map to character-level spelling.

that doesn't really feel like a failure of language modeling to me

1 comments

i flipped the results, my bad.

> Note, however, that the results in random span infilling are significantly worse in suffix-prefix-middle (SPM) format than in prefix-suffix-middle (PSM) format as it would require token healing (Microsoft, 2023),

yeah, I hear you that the decoder-only infilling approach is 'weird' -- I just don't know if I agree that it's manifestly worse at language understanding / performance than the BERT appraoch