Hacker News new | ask | show | jobs
by jy14898 25 days ago
> LLMs are good at predicting words, since each word in the id is ~1 BPE token. But uuids are random hex characters, this is where LLMs struggle to output the right ids.

If true then that indeed seems like an improvement, I think I just need measurements of actual hallucinations. Calling hex random but a selection of words not seems humanly biased? If anything, being random is good because it's saying there's no semantic influence. I'd think that words are more likely to be hallucinated as certain words only follow certain contexts, which is less true for numbers