Hacker News new | ask | show | jobs
by andyferris 300 days ago
I believe the reason it works in nonlinear cases is that the derivative is “naturally linear” (to calculate the derivative, you are considering ever smaller regions where the cost function is approximately linear - exactly “how nonlinear” the cost function is elsewhere doesn’t play a role).
1 comments

that makes a lot of sense actually. thank you.