|
|
|
|
|
by startupsfail
1178 days ago
|
|
Yes, relatively old. The issue is, this approach is designed to work with the “classical” language model, trained using “The Pile” methods. This particular one was Palm 540b. So essentially you have an approach, designed to work on these models that are not really instruction following models and that truly are stochastic parrots. The models had changed substantially since. But the approach of chaining them in this particular way stuck. And is getting copied everywhere, without much thought. |
|