| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bravura 4337 days ago

Yes, the method you propose for inducing a representation for unseen words is sound.

However, once you can train on almost one trillion tokens, the issue of unknown words is not going to happen very often. i.e. what's really left to work on is inducing higher quality representations of observed words. The goal would be that a simple model could inject these representations and perform well on, say, the word analogies task (or any other pure lexical semantics task).

What's interesting about Pennington et al's work for me is how they found a really fast training method, and thus could train on 840B tokens from Common Crawl. I've spent a lot of time thinking about this problem, and this approach is quite elegant.