Hacker News new | ask | show | jobs
by lxe 846 days ago
I'm very positive I can actually understand the terminology used in discussing machine learning models if it was presented in a way that describes the first principles a little bit better, instead of diving directly into high level abstract equations and symbols.

I'd like a way to learn this stuff as a computer engineer, in the same spirit as "big scary math symbols are just for-loops"

5 comments

Ironically, you can probably just ask a Transformer model to explain it to you.

I'm the same as you: I have no problem grasping complex concepts, I just always struggled with the mathematical notation. I did pass linear algebra in university, but was glad I could go back to programming after that. Even then, I mostly passed linear algebra because I wrote functions that solve linear algebra equations until I fully grasped the concept.

I've found that GPT-4 is very good at taking a math-notation-rich document and just describing it in terms a math-notation-averse engineer would understand.

I was a data engineer for about 6-7 years at various companies, always working together with data scientists who insist that `x_` or `_phi` are proper variable names. Man am I glad to be working with engineers now.

This is very effective.

Also, just try really hard. Repeat. It's new language to explain concepts you likely already know. You don't remember spanish by looking at the translations once.

That's a heuristic that's usually true. You can definitely understand convolution or attention better with a "big scary math symbols are just for-loops" explanation, but there are also things like dopri45 or elliptic curve crypto where we just have to accept that Weird Math Shit is happening and the symbols are inevitable. It looks to me like mamba has dragged a part of llm research into the latter camp.
It is unclear to me whether you're praising the article as particularly easy to understand or complaining that it contains equations like

  h_t = A h_{t-1} + B x_t
  y_t = C h_t
(which the author attempts to illustrate in the "My name is Jack" figure below)
If you want to learn this stuff as a computer engineer, you can read the code here [0]. I find the math quite helpful.

[0]: https://github.com/state-spaces/mamba

Ask an LLM to translate it into terms you understand. This is something they excel at.