Hacker News new | ask | show | jobs
by liuliu 4191 days ago
That's when learning rate changed to a smaller number. The graph mainly shows that with different initialization scheme, the network starts descending initially faster.