| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by colordrops 1400 days ago
	A class taught like this for me was what got me to quit physics and switch to CS.

1 comments

punnerud 1400 days ago

And why it took a long time for back propagation to be introduced into machine learning..

Back propagation is (almost) just a fancy word for differential equation, with derivative relative to the error in the output against your training data.

link

dceddia 1400 days ago

As someone who's starting to learn a bit about machine learning, it feels like the whole field is full of fancy terms like this that seem to mostly map to simpler or more familiar ones. "linear regression" instead of fitting a line, "hyperparameter" instead of user-provided argument. Half the battle seems to be building this mental translation map.

link

Test0129 1400 days ago

You are looking at it from a programmer standpoint rather than a mathematical standpoint.

Linear regression isn't just fitting a line, it's a statistical technique to fit a line of best fit. Hyperparameters are a bayesian term for parameters outside the system of test or "algorithm". User input really misses the bayesian aspect.

These terms actually have meaning so I'd be careful ascribe simpler definitions. The underlying meaning is important to the reason they work. If you don't have a really strong background in probability theory and statistics trying to dig into machine learning will take work. Id recommend taking an MITx course or picking up a textbook on probability so the terminology feels more natural.

link

Sharlin 1400 days ago

To be fair, "linear regression" is standard statistics 101 that much predates machine learning or computers.

link

meowkit 1400 days ago

A user-provided argument could also be an input parameter or a regular function parameter altogether.

Yes, hyperparameters are often set by the user of a model, but more specifically they are parameters that exist separately from the data put into a model (input parameters) or the structure inside of neural networks (hidden parameters). Hyper- meaning above, helps conceptualize these parameters as existing outside the model.

link

aaaaaaaaaaab 1400 days ago

Actually, backpropagation is more of a fancy word for the chain rule.

link

punnerud 1400 days ago

ALMOST like using the chain rule

Backpropagation ≠ Chain Rule: https://theorydish.blog/2021/12/16/backpropagation-≠-chain-r...

link

aaaaaaaaaaab 1400 days ago

That's just nitpicking, but ok: backpropagation is the application of the chain rule for total derivatives.

Look into forward- vs reverse-mode automatic differentiation, and you'll understand what I'm referring to.

link

cyber_kinetist 1400 days ago

Yes, backpropagation isn't the chain rule itself, but just an efficient way to calculate the chain rule. (In this respect there are some connections to dynamic programming, where you find the most efficient order of recursive computations to arrive at the solution).

link

blt 1399 days ago

I think of it as: computing the chain rule in the order such that we never need to compute Jacobians explicitly; only Jacobian-vector products.

I also didn't totally grasp its significance until implementing neural networks from matrix/array operations in NumPy. I hope all deep learning courses include this exercise.

link

marcosdumay 1400 days ago

Yes, they are not the same. The chain rule is what solves the one non-trivial problem with backpropagation. Besides that, it's just the quite obvious idea of changing the weights in proportion to how impactful they are on the error.

link

voqv 1400 days ago

Is that why it took long? I was under the impression it was because of diminishing gradients in backprop once you stack a huge amount of layers (the deep in deep neural networks).

link

iamcreasy 1400 days ago

Could you please forward me to a resource that explains this connection?

link

FabHK 1400 days ago

The reverse mode has famously been re-discovered (or re-applied) many times, for example as backpropagation in ML, and as AAD in finance (to compute "Greeks", ie partial derivatives of the value of a product wrt many inputs).

A few resources here:

An overview, with a bias towards finance: https://informaconnect.com/a-brief-introduction-to-automatic...

On the history: Andreas Griewank, Who Invented the Reverse Mode of Differentiation? https://ftp.gwdg.de/pub/misc/EMIS/journals/DMJDMV/vol-ismp/5...

On the history of back propagation: https://en.wikipedia.org/wiki/Backpropagation#History

The article that introduced it to finance: Michael Giles and Paul Glasserman, Smoking adjoints: fast Monte Carlo Greeks https://www0.gsb.columbia.edu/faculty/pglasserman/Other/Risk...

Survey of the application in finance: Cristian Homescu, Adjoints and Automatic (Algorithmic) Differentiation in Computational Finance https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1828503

link

punnerud 1400 days ago

It was in one of the fast.ai courses, I think where Jeremy did back propagation using Excel

https://www.fast.ai/

Could be that someone else here remember the exact video

link

montebicyclelo 1400 days ago

Hope you don't mind me plugging my blog post, that covers chain rule -> autodiff -> training of nn. https://sidsite.com/posts/autodiff/

link

iamcreasy 1399 days ago

Absolutely not. Thank you for sharing.

link