Hacker News new | ask | show | jobs
by akyu 3377 days ago
You can use gradient descent one example at a time. It still works just fine. The gradients are more unstable, but you will still converge eventually.
1 comments

"works just fine" is a relative term. I'm pretty sure when I'm learning a (new) alphabet, I don't need to see 1000 examples of each letter 100 times.