Hacker News new | ask | show | jobs
by londons_explore 988 days ago
The embedding method that nearly all LLM's use puts them at a severe disadvantage because they can't 'see' the spelling of common words. That makes it hard to infer things like 'past tense words end with an e'.

With small modifications, the exact characters could be exposed to the model, in addition to the current tokens, but it would require a full retraining, which would cost $$$$$$$$.

2 comments

You remind me of the ELMo architecture.

https://paperswithcode.com/method/elmo

So, next week on HF?