|
|
|
|
|
by nl
158 days ago
|
|
Isn't this just an awkward way of adding an extra layer to the NN, except without end-to-end training? Models like Stable Diffusion sort of do a similar thing using Clip embeddings. It works, and it's an easy way to benefit from the pre-training Clip has. But for a language model it would seemingly make more sense to just add the extra layer. |
|
I'm just focusing on different parts