Hacker News new | ask | show | jobs
by Hotple 1912 days ago
People who just use library functions without having any understanding of how they work are not going to do as well as the people who actually understand the math. The people who have real understanding will always be called in to figure out the hard stuff.
6 comments

>People who just use library functions without having any understanding of how they work are not going to do as well as the people who actually understand the math.

...who won't do as well as people who understand the business domain and that "good enough" isn't that hard to achieve with some pretty elementary stuff (regression, xgboost).

PhD's have been trying to act as gatekeepers of "Data Science" for the past decade. It's only getting easier for people to apply this stuff. Unless you're doing actual research in these algorithms, there is little need to "understand the math" beyond an undergraduate level.

+1 to this from someone who learned the math behind ML in a PhD and was looking forward to being a gatekeeper :)

My favorite academic paper ever [0] was a comparison against a bunch of dimensionality reduction algorithms and 100 year old PCA was tough to beat!

Glad I was able to pivot my career out of AI and ML. My PhD wasn't at Stanford, MIT, et al so I couldn't find any jobs doing the "actual research" - if they existed at all outside academia.

EDIT to add another funny "frustration" paper more directly related to ML [1]. I consider DR is more of a data analysis thing.

[0]: van der Maaten, et al. Dimensionality Reduction: A Comparative Review https://members.loria.fr/moberger/Enseignement/AVR/Exposes/T...

[1]: Dacrema, et al. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches https://arxiv.org/pdf/1907.06902.pdf

What is you pivot to?
Regular old engineering - modeling, controls, signal processing. My background in AI and ML helped me develop some great transferable skills (technical programming, mainly Python) and added some very attractive buzzwords to my resume! 4 years out of graduate school and I don't regret studying AI/ML. It was fun and made me more ambitious about my research and career than something more traditional would have.
> modeling, controls, signal processing

I wish AI/ML was added to Pure Data [0] or Max [1]. This would require all your skills if you could help out :)

[0] https://puredata.info/ [1] https://cycling74.com/products/max

Great links, thanks.
We don't expect the people who build our bridges or design our cars to be physicists. Physicists (and other scientists) expand science. Engineers use established science-based methods to design stuff. Technicians (also called technologists in Canada) carry out more established and routine aspects of design such as creating CAD models. Each group has their role.

For some reason, in the computer world, scientist, engineer, and technician have been merged into a single role. Full-stack, if you will. But as the success of the separation of roles in other domains shows, it does not have to be like this.

>For some reason, in the computer world, scientist, engineer, and technician have been merged into a single role. Full-stack

I mean in software the theory is so close to the practice that there's little purpose to create a role of software technician for the software engineer.

You could say developer or programmer means technician, but it's not really the case in most companies, because developers do engineering work, too.

A lot of projects don't even have a proper spec, because the program can serve as a de facto spec. In a company where you need a working program, why would you cause translation errors to slow you down if you don't need to?

Separating spec and implementation is possible, but it's an inefficiency that needs a good rationale in a profit driven business. Which is why you see proper specs with later implementation only there where these costs are deemed appropriate. Like safety critical systems in aerospace and automotive.

It does not have to be like this, but it's an efficiency optimization that's just not possible in other engineering domains.

In classical engineering the closest we've come to this is rapid prototyping with CAD tools that allow simulation. Because it's much more efficient when the engineer can do the implementation and find design errors before they're mass produced.

> Technicians (also called technologists in Canada)

Nitpick: technicians and technologists are distinct in Canada. They operate at different levels in the stack as shown in [1].

[2] has some requirements for becoming certified as a technologist or a technician.

I agree with the rest of what you’re saying. However, this system has limitations (moving up in levels) that should be improved on before this model is adopted or imitated.

[1] https://asttbc.org/wp-content/uploads/2019/02/Level-of-Work-...

[2] https://asttbc.org/how_to_apply/

Misleading. The vast majority of professional scientists don't do any science as science is defined when compared to engineering and analytics (and technical work, loosely defined). For example, if I use a well-established mathematical or statistical model to predict the range of Chamois in the Alps under the hypothesis of +2 degree Celsius during summer, am I doing science? Am I doing engineering? Am I doing analytical work? You are certainly not drawing a straight line between my work and Max Planck's.
Yes, more people who want to, can do the job today without needing a PhD.

However, most of the people who want to do the job want to do it because they heard Google pays six-figure salaries for it. But Google pays six-figure salaries for the job because it takes a PhD to do it right. When the job becomes a job that everyone can do, Google won't pay six-figure salaries for it any more - to PhDs, or anyone else.

At that point, once more, you'll need a PhD to do a job that Google pays a six-figure salary for.

The goose that lays the golden eggs gets killed over and over again. Some people get in early and get a golden egg. The rest are left to suck ordinary eggs and accuse the others of cheating.

But, who was it who killed the goose in the first place?

Nobody killed the goose, people bred them and once you have 10 geese laying golden eggs the benefit of being smart enough to "invent" the proto goose has diminished.
Going into Academia sounds like a lousy strategy to get a six figure salary.
I think you meant "seven-figure" because "six-figure" is a salary that Google pays to fresh college grads.
Probably. I'm a PhD student but I don't study the trendy stuff so I have no idea what money Google pays.

Some of us are actually in it to scratch an intellectual itch, not for the money. I gave up a lucrative career as a software dev to study the subject I'm really interested in and took a huge pay cut for the privilege of working in the only environment that offers anything near the freedom to research whatever satisfies the little gray cells. The OP's turn of phrase about gatekepping sounds like a bitter joke to me. My field gets maybe one or two new entrants a year, folks like me who don't know what's good for them and can't tame their intellectual curiosity enough to get a real job. Most PhD students, most researchers of any level, don't want to gatekeep anything, we want to tell the world about our work. But most people don't give a shit, unless Google is interested. Unless you can grab my research and be an instant millionaire nobody cares.

That will be more true as ML gets more reliable, but we aren’t quite there yet. As ML gets more useful, people are going to use it more. As people use it more, people served by it will be more frustrated by unreliable/uncontrollable aspects of it.

A bug species detector that works almost all the time will get commoditized. A bug species detector that fails gracefully won’t get commoditized until later.

An insurance risk predictor will get commoditized. An insurance risk predictor that you’re confident is fair and unbiased won’t get commoditized until later.

A chat bot that gives you customers useful information about your topic will be commoditized. A chat bot that also never will say anything to tarnish your brand (e.g. sexist/racist/wrong) won’t be commoditized until later.

A last example illustrates it perfectly. The majority of automating driving was solved pretty quickly. That last mile of reliability has taken forever.

I’m convinced that knowing what’s going on under the hood will still be valuable for a while yet, because we are just not starting to really face ML failure modes at large scale.

...why do you think it's easier for people to apply this stuff? Who do you think builds and maintains tools like umap-learn, sklearn etc.?
> Who do you think builds and maintains tools like umap-learn, sklearn etc.?

A smaller set of people than those using them.

In a previous job my team had to do a lot of ML work with a team full of gatekeepers. I was surprised how easy it was to understand the work even though we were ML newbies - we even implemented some simpler ML algorithms ourselves (long story) which wasn't too hard.

When talking to the PhD boss of the gatekeeper group, he told me that in practice a lot of practical ML work is done with a handful of algorithms and most of what he learned on his PhD is not useful in the industry. And a lot of complexity now sits behind tools (what TFA says).

He was suggesting that, in most applications, a ML expert could be useful as a consultant for a brief period early in the project, but 1) the amateurs wouldn't do too bad by picking algorithms themselves and using the tools for tuning after some study of the fundamentals and 2) in case an expert is consulted, the rest of the team can just run with their suggestions for 90%+ of the duration of the project.

I think there's a category error here. The PhD and the tinkerer are not doing the same thing when engaged in the same problem.
AI is more art than science, but still requires experience. We all know the same happens with SW engineering in general, otherwise we all know how spectacular failures can be, regardless of how much domain knowledge do you have.

Unsurprisingly extraordinary results in AI require extraordinary commitment. It may or may not make sense from a business perspective.

The author's thesis is that the "hard stuff" decreases over time, and gets commoditized as the field matures. My experience aligns with this view.

For example: you can theoretically do better than a seasonal ARIMA model for time series analysis. But in practice it's very difficult, and you probably don't have the amount of data you need or even an economic justification. The improvements will usually be marginal, expensive and not worth it.

Most teams developing a new product or doing research - even within large and ostensibly sophisticated tech companies - won't usefully or economically outperform something like Box-Jenkins.

I interpret TFA as saying that the space of "the hard stuff" is shrinking over time. So while I agree with you, if the space of the hard stuff is shrinking the demand for people who can figure it out will also be shrinking. A reduction in demand results in a lower valuation.
> the space of "the hard stuff" is shrinking over time

That's theoretically true of programming in general - and has been true of programming since programming started. Network programming used to be super specialized, but now the standard for applications is distributed over the web. Graphics programming ability used to be rare, but now it's strange to see an app without a GUI. Yet programming takes longer to learn than it used to - precisely because the state of the art has advanced so much.

I've worked with a handful of people who thought they could just use the libraries but didn't understand what they were doing or why they worked - people who could put together fancy UIs in, say, jQuery and such - and inevitably, they would find themselves hopelessly lost because they didn't actually understand what asynchronous callbacks were and couldn't figure out why, when they stepped through their programs with a debugger, the debugger kept "skipping over" their callback function.

This is roughly what leads me to agree w/ the GP. The people who have the training to understand AI at a fundamental level will have transferable skills that will give them a leg up on whatever the next "hard stuff" might be, even if it's not in the area we currently refer to as AI.
I remember when the database guys were the high priests of software development and did their secret performance rituals in secret. Nowadays 99% of developers use databases without having a clue how they work and they are fine.

I expect the same for AI. 99% of use cases will be commoditized and easily accessible for devs and only a very small number of people who understand it in depth will be needed. You already can do a lot of cool stuff by copying code with some tweaks and I see this trend only continuing.

This is true but I don't like that the math understanding is often treated like some magic secret that only a select few have access to. As long as you have a decent foundation you can research stuff and get deeper into the theory as necessary. And this inevitably happens when you switch problem domains even if you've gone really deep in a certain type of algorithms.

It's not too different from a software developer figuring out a new API/system/technique etc. but for some reason the attitude is that you either know it or you don't.

And in the context of this article, I think that the point is that even the deeper understanding isn't that valuable to many organisations.

I've been realising this recently, while I'm a professional programmer I only ever really learned the maths I needed for my degree and even then most of that got forgotten after I graduated beyond what's necessary for my day-to-day work. I did a bit of ML at university and I've been meaning to pick it up again but wanted to avoid half-arsing it by just learning the libraries rather than the underlying mathematical principles as well. One of the mental hurdles has been getting over this idea of "ML maths" as this black box, I've started with some linear algebra courses and while it's very interesting in its own right, it's also showed me I have some pretty enormous gaps in my knowledge!

Next time I'm between jobs (hopefully won't be for a long time) I'm going to revisit maths as its own thing, I really want to get my calculus and trigonometry up to scratch as well as things like linear algebra and statistics. It's interesting how quickly it leaves your head too, I did pretty well at university with ML but having not exercised those muscles so much fell out the instant that exam timer hit zero.

> having not exercised those muscles so much fell out

I had the same experience - I learned enough to pass the tests and then forgot everything as soon as the semester ended. I picked it back up out of genuine interest years later and it was amazing how much I retained now that I was actually studying because I wanted to rather than because I had to. You might also be surprised how much you actually do remember, hidden just under the surface of your consciousness, if you do go back and try to remediate on your own.

I think it could be more valuable to find a problem you find interesting and see if there is a model in the literature that you can use / implement starting with the most general/available and specializing if necessary. Most of ML work is, as the article alludes to, collecting and managing data.
What math, anyway? Knowing what a Gaussian is? There's really surprisingly very few math in the whole field and people even refuse to put things in mathematical terms if at all possible.
Agreed. It's really more like playing with Lego. Take residual connections, for instance. The insight was that information wasn't traveling far enough when the networks got too deep. So they just.... Plugged the earlier layers into the later layers. And this has been a very important development.

Or things like batch norm. We don't know why it's important. People do math to try to explain what's going on, not so much to figure out where we should go next.

Related: Understanding is a poor substitute for convexity https://www.edge.org/conversation/nassim_nicholas_taleb-unde...

> things like batch norm. We don't know why it's important

It is pretty well understood.

Posts like this really piss me off, because you can make anything sound small. So don't take the next few paragraphs as me attacking you, I'm just venting a general sentiment I've had for a while. (Looking back after typing it out, I might actually flesh it out and post it on my blog, I hopeit's somewhat thought stimulating for others as well).

It's like someone saying "well, electrical engineering is really like playing with building blocks. Take zener diodes for example, the problem was that you can have a lot of power in a circuit, but if there's a power spike it might break. So they just...plugged in a piece that breaks by shunting the power spike into the ground, then reset. And this is now a major piece of electronics." Or "so, everyone is always going off about arabian mathematicians, but one "big development" they did was to invent the zero - just make up a symbol where previously you'd leave a space. It's basically just a change of notation!".

Deep learning theory (statistical, information theoretical and optimisation wise) is our process of understanding how to design systems that adapt themselves to feedback, and how to encode tasks in them. Batchnorm was inspired by one thing (internal covariate shift), and that thing was plausible, but as it turns out, in systems as complex (not complicated, complex as in interactions) as universal function approximators, adding one thing can radically change things. As it turns out, batchnorm smooths the function, it decouples parameter magnitude and directions and it positively improves signal propagation. How else would you have figured this out without having systems like neural networks with batchnorm already in place that you can study? And now there are lines of work emerging that do away with batchnorm, but have distilled the positive properties into smaller techniques (https://arxiv.org/pdf/2102.06171.pdf, Soham De gave a lecture at our lab recently).

Same thing with skip connections: Jürgen Schmidhuber will rightfully point out we've had highway networks since his heyday, but details matter. It is really not intuitive before you do it that in such a complex system, skip connections will be beneficial, because before we had them and started studying them on complex system, the ideas of thinking of them as learning small adjustments to a signal, or as an ensemble of shallow learners or the other perspectives that they have been studied under had not been developed.

And how would you? Without having them working really well, you'd have to start thinking about them from first principles in the giant design space of nonlinear, nonconvex functions, without being able to prove anything because we don't have the mathematical formalism yet.

Deep learning theory and nonconvex optimisation right now is a new physics born out of the marriage of information theory, computer science and computer engineering (and not surprisingly in a menage a trois, a lot of groundwork was laid by the french and other weird europeans /joke). We have a bunch of theory nerds trying to explain what we see in elegant and concise mathematical frameworks and trying to come up with testable predictions, and a bunch of experimentational people actually coming up with ways of testing it, gluing together the bits of understanding we have with soft knowledge to make the learning engine go brr and give feedback to the theorists on what held up, what didn't work predictable and what didn't go according to predictions. And people mouth off about the empirical nature of things.

Well, I ask: How else would you figure this stuff out? I think there is a cult of genius at play here, where if you don't start with category theory and platonic ideal conceptions of reality and derive your model without any experiment, you are somehow lesser.

Well, despite what people like to sell, disruption is a lie, everything is incremental, and without having the hackers make things work in clunky ways, the theorists would circle jerk themselves in creative dead ends because of a lack of stimulus.

And as always, there are a lot of people who make themselves sounds smarter by affecting superiority and disdain on this scientific process, while in the background nerds deepen our understanding of the universe.

Depends on which part of the field you're referring to. Deep learning is arguably the least theory-intensive and it still requires at least calculus and linear algebra to understand what's going on.
Deep learning can be compared to brute-forcing. You can't base your career on brute-forcing because anybody can do it. And being smarter than the competition doesn't help because brute-forcing will solve all your problems anyway.
Completely agree, sure you can make superficial progress, but then you will need to further improve, and without deep knowledge of the underlying mechanics you will be lost.
Also rather important is to know what paths not to pursue in the search for a solution. Basically a solid understand of bias/variance will carry you far and help you avoid doing a week of work for an optimization that doesn't actually pan out.
Are you talking specifically about research? Because in my experience the time and $ delta needed to bring a classification model from say 93-96% accuracy is not worth it to most businesses. So all your special "deep knowledge" is irrelevant in most use cases
If you're in research I would assume a deeper background or at least an environment where you are encouraged to develop your background. In industry, yes, time to value is important, but it seems clear that in some part of the market competition will require squeezing out those remaining % points of performance -- a company scaling up with poorer accuracy would have more problems...right?

Do you have experience in industry? I would like to hear more from your perspective?

>In industry, yes, time to value is important, but it seems clear that in some part of the market competition will require squeezing out those remaining % points of performance -- a company scaling up with poorer accuracy would have more problems...right?

But I wasn't talking about a poor model vs a good model. Like I said, going from 93-96% accuracy is generally not going to have a lot of value add.

>Do you have experience in industry? I would like to hear more from your perspective?

Yes and in my experience if your model has lift over random and has a positive roi vs doing nothing it's usually going to be worth building and implementing it. In the fields I've worked in the models are not in direct competition with models from other companies so if Co. A's model for some task is getting 95% accuracy and Co. B's is getting 91% there's not much cause for concern. If Co. B's model is still generating lift over a random guesser it's worth having in production.

For most consumers I doubt they could even tell the difference between service like that. If Walmart's product recommendation is 2% better than Amazon's I find it extremely unlikely most consumers could even tell and Amazon's primary concern is whether the model is driving increased sales and not whether it's stealing customers from Walmart(the model, not generally).