No training required: Exploring random encoders for sentence classification

Y	Hacker News new \| ask \| show \| jobs

	No training required: Exploring random encoders for sentence classification (code.fb.com)
	40 points by jimarcey 2698 days ago

4 comments

jeromebaek 2698 days ago

Interesting paper. I'd like to know how this compares with even more naive methods like simple summation. If this method is an application of Cover's theorem it should handily beat summation or any other simple method that places the sentence embedding in the same dimension as the word embeddings.

link

yorwba 2698 days ago

From the "related work" section of the paper:

The nowadays surprisingly poor performance of the models in Hill et al. (2016) can at least partly be explained because 1) they use poorer (older) word embeddings; and 2) FastSent sentence representations are of the same dimensionality as the input word embeddings, while they are compared in the same table to much higher-dimensional representations.

See also figure 1 for the increase in performance across tasks when the embedding dimension is increased.

link

zuzun 2698 days ago

How does SentEval work? As I understand it, it trains a model on top of the sentence embeddings for almost all tasks. Could the baseline BOE be worse because its 300 dimensional input will give the model fewer trainable parameters compared to the 4096 dimensions of all other embeddings?

link

anon1253 2698 days ago

Love it. Especially the echo state network trick. I wonder how much of BERT/ELMO performance is simply due to them having a such a high dimensionality. Not that there is anything wrong with that, just makes a tad less practical for some applications.

link

moneil971 2698 days ago

“A strong, novel baseline for sentence embeddings that requires no training whatsoever.”

link