Hacker News new | ask | show | jobs
by T_D_K 3165 days ago
Based on the limited amount of information, I'm assuming that by "training explodes" you mean that your gradient descent never reaches a local minimum. Try lowering your learning rate? You may be "stepping over" the minimum.