|
|
|
|
|
by blackbear_
378 days ago
|
|
Note that the token embeddings are also trained, therefore their values do give some hints on how a model is organizing information. They used token embeddings directly and not intermediate representations because the latter depend on the specific sentence that the model is processing. Data on human judgment was however collected without any context surrounding each word, thus using the token embeddings seem to be the most fair comparison. Otherwise, what sentence(s) would you have used to compute the intermediate representations? And how would you make sure that the results aren't biased by these sentences? |
|
Though it sounds odd there is no problem with it and it would indeed return the model's representation of that single word as seen by the model without any additional context.