|
|
|
|
|
by huac
752 days ago
|
|
it's probably correct to think of functionally all ML models as being stateless. even something like twitter/fb feed - the models themselves remain the same (usually updated 1-2x per month IIRC) - only the data and the systems change. an illustrative example: say you open twitter, load some posts, then refresh. the model's view of you is basically the same, even the data is basically the same. you get different posts, however, because there is a system (read: bloom filter) on top of the model that chooses which posts go into ranking and that system removes posts that you've already seen. similarly, if you view some posts or like them, that updates a signal (e.g. time on user X profile) but not the actual model. what's weird about LLM's is that they're modeling the entire universe of written language, which does not actually change that frequently! now, it is completely reasonable to instead consider the problem to be modeling 'a given users' preference for written language' - which is personalized and can change. this is a different feedback to really gather and model towards. recall the ranking signals - most people don't 'like' posts even if they do like them, hence reliance on implicit signals like 'time spent.' one approach I've considered is using user feedback to steer different activation vectors towards user-preferred responses. that is much closer to the traditional ML paradigm - user feedback updates a signal, which is used at inference time to alter the output of a frozen model. this certainly feels doable (and honestly kinda fun) but challenging without tons of users and scale :) |
|