| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by swyx 1022 days ago
	yes but code llama also found that the PSM format was inferior to the SPM format presumably because those hard cuts lose context. the "real" fill-in-the-middle of BERT is i think more likely to model language compared to the "faux" F-i-t-m of flinging prefixes and suffixes around

1 comments

huac 1022 days ago

where is that reported? in Table 14 I see PSM performing much better than SPM. I also see a note about the SPM performance which attributes the degradation to the tokenizer edge cases

> As an example, our model would complete the string 'enu' with 'emrate' instead of 'merate' which shows awareness of the logical situation of the code but incomplete understanding of how tokens map to character-level spelling.

that doesn't really feel like a failure of language modeling to me

link

swyx 1022 days ago

i flipped the results, my bad.

> Note, however, that the results in random span infilling are significantly worse in suffix-prefix-middle (SPM) format than in prefix-suffix-middle (PSM) format as it would require token healing (Microsoft, 2023),

link

huac 1020 days ago

yeah, I hear you that the decoder-only infilling approach is 'weird' -- I just don't know if I agree that it's manifestly worse at language understanding / performance than the BERT appraoch

link