|
|
|
|
|
by warsheep
315 days ago
|
|
What you're describing is a simplified version of gradient descent (tweaking the weights) and online learning (working on one sample at a time). This version will not get you far, you will just train a model that solves the last math problem you gave it and maybe some others, but it will probably forget the first ones. There are other similar procedures that train better, but they've been tried and are currently worse than classical SGD with large batches |
|