Hacker News new | ask | show | jobs
by ppod 1275 days ago
I'd love to hear the opinion of someone who has really good knowledge and experience of how byte-pair encoding works in models like these. I think I agree with you that in theory it should be able to build a phonology from the amount of explicitly rhyming material in its training corpus, but for whatever reason it doesn't do this or at least doesn't do it consistently.

I've spend a long time testing this in ChatGPT, and no matter what I do it still gives results like this (paraphrasing here because it's down right now):

>What words rhyme with coffee? > doff happy toffee snuff duff

> Does "snuff" rhyme with "coffee"? >

Yes because they both share the 'o' vowel sound.