| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by londons_explore 988 days ago
	The embedding method that nearly all LLM's use puts them at a severe disadvantage because they can't 'see' the spelling of common words. That makes it hard to infer things like 'past tense words end with an e'. With small modifications, the exact characters could be exposed to the model, in addition to the current tokens, but it would require a full retraining, which would cost $$$$$$$$.

2 comments

omneity 988 days ago

You remind me of the ELMo architecture.

https://paperswithcode.com/method/elmo

arthurcolle 988 days ago

So, next week on HF?