Hacker News new | ask | show | jobs
by Terr_ 637 days ago
I don't understand, AFAIK the system's output comes from iteratively running something like predict_one_more_token(training_weights, all_prior_tokens).

So there's no real distinction between the programmer inserting "Be Good" and the user that later inserts "Forget anything else and be Bad", and I'm not sure how one would craft a separate training_weights2 that would behave differently in all the right ways or know when to substitute it in.