Hacker News new | ask | show | jobs
by locuscoeruleus 1441 days ago
Adam was very effective when it got introduced so it was widely adopted. Since then only models that work well with Adam have made it from the idea stage to actually working. I think there's reason to believe we have over fit our model architectures to our loss functions and optimizers.