|
|
|
|
|
by gpm
102 days ago
|
|
We could argue about whether fine tuning is still about predicting a distribution or not, but really I feel like whether or not that word is accurate misses the point of why the description is useful. I like the phrasing because it distinguishes it from other things the generative model might be doing including: - Creating and then refining the whole response simultaneously, like diffusion models do. - Having hidden state, where it first forms an "opinion" and then outputs it e.g. seq2seq models. Previously output output tokens are treated differently from input tokens at an architectural level. - Having a hierarchical structure where you first decide what you're going to say, and then how you're going to say it, like wikipedia's hilarious description of how "sophisticated" natural language generation systems work (someone should really update this page): https://en.wikipedia.org/w/index.php?title=Natural_language_... |
|