| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by james-revisoai 935 days ago
	I appreciate your and the other commentators scepticism, and agree for the premise of generating text. However to play devils advocate - Can't inference of single layers be quite useful sometimes for embeddings? and the larger models could embed capabilities only achievable at large parameter size? (i.e. you don't need to finetune the last layer like the BERT days or necessarily use a certain layer for relevant embeddings, you could use the first layer outputs and finetune a custom matrix projection to make some useful embeddings that might include properties of the input that the larger models show capabilities for that smaller models don't?) (ofc this assumes that those capabilities considered "emergent" at larger sizes do not require later layers which they probably do). There is a weird situation where we have the sentence transformer, the bge, e5 and etc kind of embeddings, and then a big jump up to generative model embeddings the model providers provide, but not much widespread adoption of e.g GPT-J or neo-20B embeddings (even though at time of release, they had notable usecases over sentence transformers)

2 comments

svnt 935 days ago

There must be some real use for it for sure. Maybe as a result of this effort someone else has their attention called to it and picks up the idea loosely and runs with it in an unexpected way.

One of the benefits of lack of contextual or historical understanding is the lack of barriers to action. One of the downsides is you’re likely to do a lot of things of questionable value.

link

sp332 935 days ago

You can save some space, but it still takes more than one layer to do an embedding.

link