|
|
|
|
|
by tagrun
1510 days ago
|
|
You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work? FYI, the documentation of the function https://fluxml.ai/Flux.jl/stable/training/optimisers/ explicitly says it is learning rate: > Learning rate (η): Amount by which gradients are discounted before updating the weights. so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter. |
|
You can look up "learning rate" much easier than to look up "what is this Greek letter on my screen" followed by "what is the use of this Greek letter in my context" and only then followed by searching for "learning rate"
More importantly, it's possible to know what a learning rate is without knowing what Greek letter it's commonly denoted as. Especially since mathematical notation is so inconsistent across authors. I want less ambiguity in code, not more. Explicit is better than implicit.
Mathematical notation is notorious for being an absolute mess of inconsistencies. Who in their right mind looked at it and went "yep, I want more of this in my source code".