|
|
|
|
|
by odnes
1748 days ago
|
|
Both MAS and (the earlier) EWC facilitate continual learning through passing a bunch of samples through the network and collecting gradients to determine which weights are 'important'. Future weight changes are then regularised by these importance values so that the network retains its ability on past tasks. EWC uses square gradients as importance values, whereas MAS uses absolute gradients... Other than that they're the same lol (I think), how the MAS paper got so many citations I have no idea. |
|