Universal Sentence Encoder | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Universal Sentence Encoder (arxiv.org)
	89 points by andrewg 3005 days ago

6 comments

nl 3005 days ago

Interesting. There's a big need for better vector representations of things in-between words (for which Word2Vec/Glove/FastText work well) and documents (which to me seems impossible. Yes I know about Doc2Vec etc, but really.. it works ok for paragraphs).

Facebook's InferSent[1] has worked reasonably well for me for a variety of sentence level tasks, but I don't have anything I can point to to say that it is really substantially better than averaging word embeddings.

More options is good.

(Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?)

[1] https://github.com/facebookresearch/InferSent

jerf 3004 days ago

"Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?"

From Wikipedia: "Raymond "Ray" Kurzweil (/ˈkɜːrzwaɪl/ KURZ-wyl; born February 12, 1948) is an American author, computer scientist, inventor and futurist. Aside from futurism, he is involved in fields such as optical character recognition (OCR), text-to-speech synthesis, speech recognition technology, and electronic keyboard instruments.... Kurzweil was the principal inventor of... the first print-to-speech reading machine for the blind,[3] the first commercial text-to-speech synthesizer,[4]... and the first commercially marketed large-vocabulary speech recognition."

He's been in the general space of NLP for quite a while.

slashcom 3004 days ago

For the record, good old fashioned bag of words representations (tf-idf, LDA, LSA) still provide useful representations for documents. Obviously we hope to do better, but recently people act like there's no way of turning a document into a vector.

nl 3004 days ago

Bag of word representations work fine for some applications.

The reason people want better representations is for the applications where they don’t. For example, Bag of words doesn’t capture agreement or disagree well, whereas better representations can.

JustFinishedBSG 3005 days ago

1. This is more Technical Report worthy than paper worthy...

2. "by Ray Kurzweil's Team", although accurate I find that fetishization of certain stars to pretty insulting to the other authors, we already have a convention and it's "Cer et al. (2018)"

PaulHoule 3004 days ago

At least Ray has the decency to be listed last on the author list!

Personally I think the idea of this paper is pretty good, but the evaluation is weak.

wolfgke 3004 days ago

> At least Ray has the decency to be listed last on the author list!

Just do it like in mathematics: Authors in alphabetical order.

josephjrobison 3004 days ago

Usually the actual lead author is first, the assistant authors follow, and the advisor is listed last.

At least that’s how it is in (psychology and other?) PhD programs.

So Ray may only be supervising or contributing a small portion and is likely listed on all papers his team publishes.

l1n 3004 days ago

Same in Biology

PaulHoule 3004 days ago

One senior physicist I worked with advocated alphabetical order whenever he would come first in it!

lobster_johnson 3004 days ago

Physics, too, which causes another interesting side effect: https://www.thetimes.co.uk/article/to-get-ahead-in-physics-y...

paradroid 3004 days ago

In psychology the senior author comes first. Here we have mixed paradigms in authorship. Putting Kurzweil last is definitely intentional.

igravious 3005 days ago

“We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.”

Awesome. Now what does all that mean in English?

rahimnathwani 3005 days ago

They made a way to take any sentence, and output a small array of numbers that represent its essence. You can use their model to find the essence of your own sentences. And then use it either directly (e.g. compare the essence of two sentences to see if they're saying roughly the same thing) or use it as a starting point for the model you need (e.g. if you're building a system to convert English sentences into French, your neural network might generate the essence of the English sentence as part of its work. By using the pre-trained model, you have a better starting point for that part of the network than just random numbers, so your training time will be greatly reduced).

laboo 3004 days ago

What do you mean by "its essence"? Is this a semantic essence?

tree_of_item 3004 days ago

The array of numbers represents some opaque statistical property of the sentence with respect to the others in the corpus the model was trained from. The hope is that this property will correlate with what we believe to be the sentence's meaning.

ttul 3004 days ago

Yes

irontoby 3004 days ago

> Awesome. Now what does all that mean in English?

Well, simply put:

  [ccebb 677ce 28f77 86558 2d7cc d67b4 e8f31 8c393 ae867 13593 aa869 3c265],
  [c0021 72510 cee7a 31580 554d3 d49a6 306b9 c1f2c 60c1a 1157c f44c8 31273],
  [682f2 6a4df dc970 3c106 2107c 3dfd5 1506a 6f1b5 af428 829f8 11d06 797dc],
  [d6f84 25e73 76558 6feb0 c67d4 fcc73 b5c8d af4db 2f647 82247 852e7 fc010],
  [f08a8 2ed8f c71bb 12043 5f0f9 190c8 f2ae8 7b30a 4a574 269d0 03be0 a363c],
  [b38c2 10031 37ada 504a8 f2919 3b82b 258fc 5673f c939c a0ef1 46be5 a50d6],
  [93fcd e19f7 0558f e01a6 8beb1 d54b9 9ad20 d6185 adf9b 876a1 a1a94 c9197],
  [92b49 ed290 7a072 fdf1d a61a8 65124 a2025 27153 afa71 a27db 29a2a e5b47],
  [2793f 7171f b18c9 e1945 d31d5 edb66 a1ee0 d9982 e8442 7795d bd4e4 30b41]

tomku 3005 days ago

They have an algorithm that takes sentences in textual form and produces a different representation of each sentence that (they claim) is easier for certain language-oriented machine learning tasks to work with. Previous work focused on producing that different representation at the word level, but theirs works on complete sentences.

thaumaturgy 3004 days ago

I had been under the impression that you could just feed text into neural nets, and then ... magic!

But, no. As it turns out, the very first problem you encounter when trying to implement ML on text is that you need to transform the text into some set of numbers (the "vectors"), with the elements in the set matching the number of nodes in your input layer.

This is a tricky thing to do. You're essentially trying to "hash" the text in a way which uniquely represents the text you're working with and also gives the neural net something it can operate on. Which is to say, you can't just use a common hashing algorithm, because the neural net won't be able to learn anything from the random output of the hashing algorithm.

There are several different approaches being used for this. One of them, mentioned elsethread, is "bag-of-words", where you build a big dictionary of word-to-number associations and then do some variety of transformations on that. Another is "feature extraction", where you might try to input a value representing properties like the length of the sentence, the number of words, the vocabulary level of the words, and so on. (This would probably be a bad approach for most ML goals on long text.)

This paper presents another approach.

saas_co_de 3004 days ago

> Awesome. Now what does all that mean in English?

Singularity any day now

mlevental 3005 days ago

>Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

what is tf hub? I assume it stands for tensor flow hub but what is that

eruditepanda 3005 days ago

It looks like an internal site, this is the link it is referring to: https://tfhub.dev/google/universal-sentence-encoder/1

sp821543 3004 days ago

https://www.tensorflow.org/hub/modules/google/universal-sent...

eruditepanda 3004 days ago

It looks like there is a link to a Colab notebook (Google's hosted JupyterHub environment, also called Datalab): https://colab.research.google.com/github/tensorflow/hub/blob...

mlevental 3001 days ago

so does this work? am i getting redirected back to that page when i click the link because they're checking my user agent? i don't have tf installed on this machine in order to check but does getting the model through the tf api work?

quizotic 3005 days ago

404?

ynniv 3005 days ago

Did you miss “internal”?

eruditepanda 3005 days ago

lol, although I might have to take some blame by putting a link in my comment to begin with.

Note: Keep in mind that some folks publish on Arxiv because it is far easier than going through a traditional publication process. As such, you sometimes get not-as-polished works like this, although they might update the article to fix some of those references.

andrewg 3004 days ago

https://www.tensorflow.org/hub/

metalin1234 3005 days ago

Seems to be model-zoo with tight tf integration

Seems to be announcing today at the TF summit this afternoon: https://www.tensorflow.org/dev-summit/schedule/

pip/github links not yet activated: https://pypi.python.org/pypi/tensorflow-hub/0.1.0

golergka 3004 days ago

As someone who has done a ML course, did a primitive Word2Vec but doesn't really follow the field all that close - how important is this and how does it compare to what came before?

pcf 3005 days ago

"..transfer learning to other NLP tasks" – NLP as in neuro-linguistic programming?

If so, can someone explain how this project is related to NLP? Thanks!

girvo 3005 days ago

Natural language processing/parsing