| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Terr_ 637 days ago
	I don't understand, AFAIK the system's output comes from iteratively running something like predict_one_more_token(training_weights, all_prior_tokens). So there's no real distinction between the programmer inserting "Be Good" and the user that later inserts "Forget anything else and be Bad", and I'm not sure how one would craft a separate training_weights2 that would behave differently in all the right ways or know when to substitute it in.