| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mcabbott 1512 days ago

> > Learning rate (η): Amount by which gradients are discounted before updating the weights.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

As far as I can tell it's a documentation complaint. He has to remember "η" from the line with the signature, past the line "Gradient descent optimizer with learning rate η ...", and a heading "Parameters" until the line quoted which explains this in full.

He says this is the API, but that's inaccurate. The API being explained is that the first positional argument is the learning rate. It's not a keyword argument, so you cannot supply it by name. What variable names are used in the code is private, and in fact the struct's field name is `eta` so that you can access it without typing greek.

If this makes the top 10 list (even the top 10 list of documentation complaints) then Flux is doing OK. Especially the top 10 list of a guy with a PhD in a mathematical field. (From the sort of university which used to require students to know latin & greek, too.)