Hacker News new | ask | show | jobs
by ricardobeat 46 days ago
Love this kind of experiment. Would the model perform better with word tokens?
1 comments

A friend of mine forked the repo and tried it with BPE (Byte-Pair Encodings), and it did noticeably improve performance.