Hacker News new | ask | show | jobs
by nl 3005 days ago
Interesting. There's a big need for better vector representations of things in-between words (for which Word2Vec/Glove/FastText work well) and documents (which to me seems impossible. Yes I know about Doc2Vec etc, but really.. it works ok for paragraphs).

Facebook's InferSent[1] has worked reasonably well for me for a variety of sentence level tasks, but I don't have anything I can point to to say that it is really substantially better than averaging word embeddings.

More options is good.

(Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?)

[1] https://github.com/facebookresearch/InferSent

2 comments

"Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?"

From Wikipedia: "Raymond "Ray" Kurzweil (/ˈkɜːrzwaɪl/ KURZ-wyl; born February 12, 1948) is an American author, computer scientist, inventor and futurist. Aside from futurism, he is involved in fields such as optical character recognition (OCR), text-to-speech synthesis, speech recognition technology, and electronic keyboard instruments.... Kurzweil was the principal inventor of... the first print-to-speech reading machine for the blind,[3] the first commercial text-to-speech synthesizer,[4]... and the first commercially marketed large-vocabulary speech recognition."

He's been in the general space of NLP for quite a while.

For the record, good old fashioned bag of words representations (tf-idf, LDA, LSA) still provide useful representations for documents. Obviously we hope to do better, but recently people act like there's no way of turning a document into a vector.
Bag of word representations work fine for some applications.

The reason people want better representations is for the applications where they don’t. For example, Bag of words doesn’t capture agreement or disagree well, whereas better representations can.