Hacker News new | ask | show | jobs
by gpm 102 days ago
We could argue about whether fine tuning is still about predicting a distribution or not, but really I feel like whether or not that word is accurate misses the point of why the description is useful.

I like the phrasing because it distinguishes it from other things the generative model might be doing including:

- Creating and then refining the whole response simultaneously, like diffusion models do.

- Having hidden state, where it first forms an "opinion" and then outputs it e.g. seq2seq models. Previously output output tokens are treated differently from input tokens at an architectural level.

- Having a hierarchical structure where you first decide what you're going to say, and then how you're going to say it, like wikipedia's hilarious description of how "sophisticated" natural language generation systems work (someone should really update this page): https://en.wikipedia.org/w/index.php?title=Natural_language_...

1 comments

Welllll I'm not so sure that phrase is well-suited for your intended meaning, then. (Also, tangentially, I think could argue thinking models w/ the elided thought prelude satisfy "having hidden state where it first forms an opinion.")