| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tagrun 1510 days ago

You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

FYI, the documentation of the function https://fluxml.ai/Flux.jl/stable/training/optimisers/ explicitly says it is learning rate:

> Learning rate (η): Amount by which gradients are discounted before updating the weights.

so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

2 comments

jstx1 1510 days ago

> How does that work?

You can look up "learning rate" much easier than to look up "what is this Greek letter on my screen" followed by "what is the use of this Greek letter in my context" and only then followed by searching for "learning rate"

More importantly, it's possible to know what a learning rate is without knowing what Greek letter it's commonly denoted as. Especially since mathematical notation is so inconsistent across authors. I want less ambiguity in code, not more. Explicit is better than implicit.

Mathematical notation is notorious for being an absolute mess of inconsistencies. Who in their right mind looked at it and went "yep, I want more of this in my source code".

link

mattkrause 1509 days ago

This depends a lot on the target audience for your code.

For research-focused code, it is likely that whatever you're implementing was initially described in terms of mathematical notation (e.g., in a paper or book). It can be helpful to have variables that unambiguously match that canonical source. In fact, a lot of my Julia code has docstrings containing references/links to the original paper and a comment noting that it uses the notation therein.

This sidesteps the problem where textual descriptions like `learning_rate` can sometimes be ambiguous: is it the original learning rate, or perhaps the current rate after applying some sort of schedule or decay? I think the Flux documentation is pretty close to ideal, in that it's got a symbol you can match against equations (though no reference to them) as well as text that you can search to learn more.

link

tagrun 1510 days ago

You are not answering the (rhetorical) question that you quoted, and the answer to your response is already in the paragraph that followed it:

As I said, the necessary keywords for Googling it, along with a brief description is already present in the documentation.

The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

> Mathematical notation is notorious for being an absolute mess of inconsistencies.

According to whom? What exactly is inconsistent?

link

nullstyle 1510 days ago

> The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

No, that is not the quibble. My quibble is with choosing identifiers that make code less legible when taken by itself. The best code teaches future readers about how it works.

link

mcabbott 1510 days ago

The struct's field name is `eta`, but this is an internal detail. Its constructor takes a positional-only argument, no public name.

The greek letter is used in the documentation. And the reason is that every optimiser's documentation links to the original paper, and tries to follow that. If the Adam paper calls the two decay rates β1, β2, then staying close to that seems the least confusing option.

link

nullstyle 1509 days ago

Perhaps I'm missing your point, but I think you're focussing too much on the specific case that someone who isn't me came up with.

My most general point is that the identifiers we use in our code are almost never just convention or taste when we are sharing that code with anyone else (and for most, "anyone else" includes our future selves). Getting a little more specific, I'm specifically interested in Julia and look forward to working in it further, but I've personally felt pain around scientific/mathematical notation when trying to understand code I've found on github. tagrun dismissing my pain as nonsense and the people who argue for my ilk as perpetrators of bikesheddding is dipshitted. Yeah, I'm probably the asshole for being a college dropout trying to leverage modern scientific computing for my own ends (snoogins), but I'm also willing to bet tagrun is probably the member of a team that talks down to junior members and complains they haven't read enough papers or the right papers to see the magnificience of their code ;).

It's fine to write code that demands a domain expert to understand, but don't pretend like its good across all dimensions. There are tradeoffs involved.

Personally I find the preponderance of scientific/mathematical notation (whatever you want to call it) in Julia to be cute; It certainly does bind the code to linked papers in a pretty cool way when it all fits together properly. That said, its a pain in the ass when it doesn't fit together properly and I've personally had a journey into Julia spoiled due to frequency at which I had to figure out how to notate something or what word to use when regarding some squiggle I haven't encountered before. I look forward to having a better intuition for the greek alphabet but until then Julia will often be harder to read, let alone understand when compared to ruby or javascript or go or C# or any other of the roughly dozen programming languages I've worked with and feel comfortable translating between.

link

mcabbott 1510 days ago

> > Learning rate (η): Amount by which gradients are discounted before updating the weights.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

As far as I can tell it's a documentation complaint. He has to remember "η" from the line with the signature, past the line "Gradient descent optimizer with learning rate η ...", and a heading "Parameters" until the line quoted which explains this in full.

He says this is the API, but that's inaccurate. The API being explained is that the first positional argument is the learning rate. It's not a keyword argument, so you cannot supply it by name. What variable names are used in the code is private, and in fact the struct's field name is `eta` so that you can access it without typing greek.

If this makes the top 10 list (even the top 10 list of documentation complaints) then Flux is doing OK. Especially the top 10 list of a guy with a PhD in a mathematical field. (From the sort of university which used to require students to know latin & greek, too.)

link