| HN Mirror

They might, but we may also find they don’t function as well or as predictably if increasing amounts of their weights are suppressed. Research has so far shown that knowledge is incredibly, vastly diffuse, as are causes of different behaviors. There was some research that came out of Anthropic where a model being taught number sequences by another model, and that second model had fine tuning which with a stated preference for owls. The student model, despite no overt exposure to anything of the sort, expressed the same preference. The subtlety of influence that even very minor things have on the vast network of weights is, at least at present, too poorly understood to know what we’re getting in the bargain when holes are poked.