Hacker News new | ask | show | jobs
by lewhoo 1038 days ago
So, the TLDR could be: they memorize at first and then generalize ?
1 comments

depends on the hyperparameters, and the architecture (and probably the task)