Hacker News new | ask | show | jobs
by habitue 1053 days ago
> Perhaps more importantly, the editing is one-directional: the edit “The capital of France is Rome” does not modify “Paris is the capital of France.” So completely brainwashing the model would be complicated.

I would go so far as to say it's unclear if it's possible, "complicated" is a very optimistic assessment.

1 comments

A good case that consistent brainwashing is likely laborious to do manually.

But why leave the job to humans?

I expect an effective approach is to have model A generate many possible ways of testing model B, regarding an altered fact. Then update B wherever it hasn't fully incorporated the new "fact".

My guess is that each time B was corrected, the incidence of future failures to product the new "fact" would drop precipitously.