| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by heavymemory 198 days ago
	The idea is interesting, but I still don’t understand how this is supposed to solve continual learning in practice. You’ve got a frozen transformer and a second module still trained with SGD, so how exactly does that solve forgetting instead of just relocating it?