Hacker News new | ask | show | jobs
by heavymemory 193 days ago
This still seems like gradient descent wrapped in new terminology. If all learning happens through weight updates, its just rearranging where the forgetting happens