Hacker News new | ask | show | jobs
by catgary 292 days ago
I’m going to push back on this a bit. I think a simpler explanation (or at least one that doesn’t involve projecting one’s own insecurities onto the authors) is that the people who write these papers are generally comfortable enough with mathematics that they don’t believe anything has been obfuscated. ML is a mathematical science and many people in ML were trained as physicists or mathematicians (I’m one of them). People write things this way because it makes symbolic manipulations easier and you can keep the full expression in your head; what you’re proposing would actually make it significantly harder to verify results in papers.
2 comments

Maybe.

But my experience as a mathematician tells me another part of that story.

Certain fields are much more used to consuming (and producing) visual noise in their notation!

Some fields have even superfluous parts in their definitions and keep them around out of tradition.

It's just as with code: Not everyone values writing readable code highly. Some are fine with 200 line function bodies.

And refactoring mathematics is even harder: There's no single codebase and the old papers don't disappear.

Maybe! I’ve found that people usually don’t do extra work if they don’t need to. The heavy notation in differential geometry, for example, can be awfully helpful when you’re actually trying to do Lagrangian mechanics on a Riemannian manifold. And superfluous bits of a definition might be kept around because going from the minimal definition to the one that is actually useful in practice can sometimes be non-trivial, so you’ll just keep the “superfluous” definition in your head.
To add to this, I'd even argue that the most "scary looking" parts of the GAN paper are where Goodfellow is just showing intermediate steps, like in (4) and (5). I guess one can argue that this is superfluous but that feels pretentious. I'd argue that the math here is helping communicate.

I think people forget why math is used. I'm always a little surprised that programmers don't see this because the languages are being used for the same reasons. Precision. They're terrible languages to communicate something like this conversation but then again English is a terrible way to communicate highly abstract concepts.

On the other hand, I've definitely seen people use math to make their works seem more important (definitely in some ML) I think I more frequently see it just being copy pasted (like every diffusion paper ever). I think that is probably superfluous, though it's definitely debatable and I'm absolutely certain these use cases aren't for flexing lol.

Agreed. Also, fwiw, the mathematics involved in the paper are pretty simple as far as mathematical sophistication goes. Spend two to three months on one "higher level" maths course of your choosing and you'll be able to fully understand every equation in this paper relatively easily. Even a basic course in information theory coupled with some discrete maths should give you essentially all you need to comprehend the math in this post. The concepts being presented here are not mysterious and much of this math is banal. Mathematical notation can seem foreboding, but once you grasp it, you'll see, like Von Neumann said, that life is complicated but math is simple.
> like Von Neumann said, that life is complicated but math is simple

Maybe for Von Neumann math was simple...