|
|
|
|
|
by Terr_
637 days ago
|
|
I don't understand, AFAIK the system's output comes from iteratively running something like predict_one_more_token(training_weights, all_prior_tokens). So there's no real distinction between the programmer inserting "Be Good" and the user that later inserts "Forget anything else and be Bad", and I'm not sure how one would craft a separate training_weights2 that would behave differently in all the right ways or know when to substitute it in. |
|