Hacker News new | ask | show | jobs
by ssivark 2879 days ago
> I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.

To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.

The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.

Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".

3 comments

> To play devil's advocate, probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.

And of course, you're not going to get very far with probability theory and stochastic processes unless you have a mature understanding of analysis and measure theory :)

This comment exchange neatly demonstrates the intrinsic problem. Most of these articles start off much like this one does: by assuming "basic comfortability with linear algebra." That sounds straightforward, but most software engineers don't have it. They haven't needed it, so they haven't retained it even if they learned it in college. It takes a good student a semester in a classroom to achieve that "comfortability", and for most it doesn't come until a second course or after revisiting the material.

If you don't already have it, you can't just use StackExchange to fill in the blanks. The random walk method to learning math doesn't really pan out for advanced material because it all builds on prior definitions. Then people like you make a comment to point out (correctly) that probability theory is just as important for all the machine learning that isn't just numerical optimization. But unless you want to restrict yourself to basic statistics and discrete probability, you're going to have a bad time working on probability without analysis. And analysis is going to a pain without calculus, and so on and so forth.

There are certain things you need to spend a lot of time learning. Engineering and mathematics are both like that. But I think many of these articles do a disservice by implying that you can cut down on the learning time for the math if you have engineering experience. That's really not the case. If you're working in machine learning and you need to know linear algebra (i.e. you can't just let the underlying library handle that for you), you can't just pick and choose what you need. You need to have a robust understanding of the material. There isn't a royal road.

I think it's really great people like the author (who is presumably also the submitter) want to write these kinds of introductions. But at the same time, the author is a research assistant in the Stanford AI Lab. I think it's fair to say he may not have a firm awareness of how far most software engineers are from the prerequisites he outlined. And by extension, I don't think most people know what "comfortability with linear algebra" means if they don't already have it. It's very hard to enumerate your unknown unknowns in this territory.

I get what you are saying, but is the right way to learn math with a "connected path". I've heard "The art of problem solving" series works through math in the correct way, but I'm not sure how far I would get on that alone. Right now I'm trying to gain intuition in linear algebra via OCW with Strang, but I would like to truly understand. Is the only way to just to do a second bachelors in math?
You don't need to do a second bachelors - you really need four or so courses. If you have the patience and dedication you can sit down with the textbooks and work through them on your own.
This.

There's always more you might want to learn, but when people talk about these basics, it's really just being super focused in 4 or so classes, not a whole ivy league undergrad curriculum in math.

probability & stats, multivariable calculus, and linear algebra will take you a long way.

Cool. I will look into those, but I was asking as a general interest in math question. I actually have no interest in machine learning. I'm bored of chasing money. Interested in 3D computer graphics and math for math's sake.
> They haven't needed it, so they haven't retained it even if they learned it in college.

True for me. I knew all of these from my course work when I graduated with my CS degree in 1996. I haven't used them at all in my career, and so I'd be starting basically from scratch re-learning them.

Can you recommend books and online courses to hammer these concepts down? I used PCA and k-means for my masters thesis but didn’t really know how well they work under the covers.
As is mentioned in this thread, Linear Algebra Done Right is a solid textbook for learning linear algebra. I might start there =).
to achieve that "comfortability"

"comfort" is a perfectly cromulent word for this.

I was quoting the article; but thank you, I didn't know that. Good to know.
Ah, my mistake. I must have edited it right out when I read the thing and took the quotes for 'I know I'm making up a word but can't think of anything better right this second'.
> To play devil's advocate, probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.

For intuition, particularly if you care about vision applications, I think one field of math which is severely underrated by the community is group theory. Trying to understand methods which largely proceed by divining structure without first trying to understand symmetry has to be a challenge.

I'm biased; my training was as a mineralogist and crystallographer! But the serious point here is that much of the value of math is as a source of intuition and useful metaphor. Facility with notation is pretty secondary.

Can you talk about the use of group theory for computer vision or crystallography a bit? I'm familiar with the math but I'm not familiar with group theory's applications in those areas. That sounds pretty interesting. Is it primarily group theory, or does it so venture into abstract algebra more generally?
For crystallography, the use of group theory in part originates in X-ray crystallography [1], where the goal is to take 2D projections of a repeating 3D structure (crystal), and use that along with other rules that you know to re-infer what the 3D structure is.

Repeating structures have symmetries, so seeing the symmetries in your diffraction pattern inform you of the possible symmetries (and hence possible arrangements) in your crystal. Group theory is the study of symmetry.

By the way, this is also how the structure of DNA was inferred [2], although not from a crystal.

[1] https://en.wikipedia.org/wiki/X-ray_crystallography#Crystal_...

[2] https://www.dnalc.org/view/15014-Franklin-s-X-ray-diffractio...

> use that along with other rules that you know to re-infer what the 3D structure is

Great answer, thank you :-) Saved me a bunch of typing to explain it less well than you just did.

It's worth adding, for this crowd, that another way of thinking about the "other rules" you allude to is as a system of constraints; you can then set this up as an optimization problem (find the set of atomic positions minimizing reconstruction error under the set of symmetry constraints implied by the space group – so that means that solving crystal structures and machine learning are functionally isomorphic problems.

I thought the work on the structure of DNA used Fourier analysis more than group theory.

I know harmonic analysis in general combines the two, but I'm sure Crick and Watson could have done their work without knowing the definition of a group.

And by Crick and Watson you mean Crick, Watson, Franklin and Wilkins, right? It's fairly clear all four deserve at least partial authorship by modern standards. James Watson was a piece of work.

(https://www.theguardian.com/science/2015/jun/23/sexism-in-sc...)

Crick was absolutely certainly familiar with the crystallographic space groups; he was the student of Lawrence Bragg (https://en.wikipedia.org/wiki/Lawrence_Bragg), who is the youngest ever Nobel laureate in physics – winning it with his father for more or less inventing X-ray crystallography. It's mostly 19th-century mathematics, after all.

For ML, you need both—probability to justify the setup of the problem, and linear algebra and calculus to optimize for a solution.

A simple example is with linear regression: find w such that the squared l2 norm of (Xw - y) is minimized.

Linear algebra will help with generalizing to n data points; and calculus will help with taking the gradient and setting equal to 0.

Probability will help with understanding why the squared l2 norm is an appropriate cost function; we assumed y = Xw + z, where z is Gaussian, and tried to maximize the likelihood of seeing y given x.

I’m sure there’s more examples of this duality since linear regression is one of the more basic topics in ML.