|
|
|
|
|
by james-revisoai
935 days ago
|
|
I appreciate your and the other commentators scepticism, and agree for the premise of generating text. However to play devils advocate - Can't inference of single layers be quite useful sometimes for embeddings? and the larger models could embed capabilities only achievable at large parameter size? (i.e. you don't need to finetune the last layer like the BERT days or necessarily use a certain layer for relevant embeddings, you could use the first layer outputs and finetune a custom matrix projection to make some useful embeddings that might include properties of the input that the larger models show capabilities for that smaller models don't?) (ofc this assumes that those capabilities considered "emergent" at larger sizes do not require later layers which they probably do). There is a weird situation where we have the sentence transformer, the bge, e5 and etc kind of embeddings, and then a big jump up to generative model embeddings the model providers provide, but not much widespread adoption of e.g GPT-J or neo-20B embeddings (even though at time of release, they had notable usecases over sentence transformers) |
|
One of the benefits of lack of contextual or historical understanding is the lack of barriers to action. One of the downsides is you’re likely to do a lot of things of questionable value.