|
|
|
|
|
by tinco
1011 days ago
|
|
We wouldn't know how to construct those matrices because we don't know where in the layers what knowledge is represented. One thing that helps a little bit is freezing the lower layers, so at least the model won't forget its most fundamental knowledge. Note that the only reason that things are catastrophically forgotten, is that the original examples are not shown again. If the model learns in a single shot, there might simply be no time to show both the old and the new examples. I don't think it would have a significant effect or else we'd know about this effect a lot sooner (i.e. the training of these LLM's would get less effective from a certain point) |
|