Hacker News new | ask | show | jobs
by itsoktocry 1912 days ago
>People who just use library functions without having any understanding of how they work are not going to do as well as the people who actually understand the math.

...who won't do as well as people who understand the business domain and that "good enough" isn't that hard to achieve with some pretty elementary stuff (regression, xgboost).

PhD's have been trying to act as gatekeepers of "Data Science" for the past decade. It's only getting easier for people to apply this stuff. Unless you're doing actual research in these algorithms, there is little need to "understand the math" beyond an undergraduate level.

8 comments

+1 to this from someone who learned the math behind ML in a PhD and was looking forward to being a gatekeeper :)

My favorite academic paper ever [0] was a comparison against a bunch of dimensionality reduction algorithms and 100 year old PCA was tough to beat!

Glad I was able to pivot my career out of AI and ML. My PhD wasn't at Stanford, MIT, et al so I couldn't find any jobs doing the "actual research" - if they existed at all outside academia.

EDIT to add another funny "frustration" paper more directly related to ML [1]. I consider DR is more of a data analysis thing.

[0]: van der Maaten, et al. Dimensionality Reduction: A Comparative Review https://members.loria.fr/moberger/Enseignement/AVR/Exposes/T...

[1]: Dacrema, et al. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches https://arxiv.org/pdf/1907.06902.pdf

What is you pivot to?
Regular old engineering - modeling, controls, signal processing. My background in AI and ML helped me develop some great transferable skills (technical programming, mainly Python) and added some very attractive buzzwords to my resume! 4 years out of graduate school and I don't regret studying AI/ML. It was fun and made me more ambitious about my research and career than something more traditional would have.
> modeling, controls, signal processing

I wish AI/ML was added to Pure Data [0] or Max [1]. This would require all your skills if you could help out :)

[0] https://puredata.info/ [1] https://cycling74.com/products/max

Great links, thanks.
We don't expect the people who build our bridges or design our cars to be physicists. Physicists (and other scientists) expand science. Engineers use established science-based methods to design stuff. Technicians (also called technologists in Canada) carry out more established and routine aspects of design such as creating CAD models. Each group has their role.

For some reason, in the computer world, scientist, engineer, and technician have been merged into a single role. Full-stack, if you will. But as the success of the separation of roles in other domains shows, it does not have to be like this.

>For some reason, in the computer world, scientist, engineer, and technician have been merged into a single role. Full-stack

I mean in software the theory is so close to the practice that there's little purpose to create a role of software technician for the software engineer.

You could say developer or programmer means technician, but it's not really the case in most companies, because developers do engineering work, too.

A lot of projects don't even have a proper spec, because the program can serve as a de facto spec. In a company where you need a working program, why would you cause translation errors to slow you down if you don't need to?

Separating spec and implementation is possible, but it's an inefficiency that needs a good rationale in a profit driven business. Which is why you see proper specs with later implementation only there where these costs are deemed appropriate. Like safety critical systems in aerospace and automotive.

It does not have to be like this, but it's an efficiency optimization that's just not possible in other engineering domains.

In classical engineering the closest we've come to this is rapid prototyping with CAD tools that allow simulation. Because it's much more efficient when the engineer can do the implementation and find design errors before they're mass produced.

> Technicians (also called technologists in Canada)

Nitpick: technicians and technologists are distinct in Canada. They operate at different levels in the stack as shown in [1].

[2] has some requirements for becoming certified as a technologist or a technician.

I agree with the rest of what you’re saying. However, this system has limitations (moving up in levels) that should be improved on before this model is adopted or imitated.

[1] https://asttbc.org/wp-content/uploads/2019/02/Level-of-Work-...

[2] https://asttbc.org/how_to_apply/

Misleading. The vast majority of professional scientists don't do any science as science is defined when compared to engineering and analytics (and technical work, loosely defined). For example, if I use a well-established mathematical or statistical model to predict the range of Chamois in the Alps under the hypothesis of +2 degree Celsius during summer, am I doing science? Am I doing engineering? Am I doing analytical work? You are certainly not drawing a straight line between my work and Max Planck's.
Yes, more people who want to, can do the job today without needing a PhD.

However, most of the people who want to do the job want to do it because they heard Google pays six-figure salaries for it. But Google pays six-figure salaries for the job because it takes a PhD to do it right. When the job becomes a job that everyone can do, Google won't pay six-figure salaries for it any more - to PhDs, or anyone else.

At that point, once more, you'll need a PhD to do a job that Google pays a six-figure salary for.

The goose that lays the golden eggs gets killed over and over again. Some people get in early and get a golden egg. The rest are left to suck ordinary eggs and accuse the others of cheating.

But, who was it who killed the goose in the first place?

Nobody killed the goose, people bred them and once you have 10 geese laying golden eggs the benefit of being smart enough to "invent" the proto goose has diminished.
Going into Academia sounds like a lousy strategy to get a six figure salary.
I think you meant "seven-figure" because "six-figure" is a salary that Google pays to fresh college grads.
Probably. I'm a PhD student but I don't study the trendy stuff so I have no idea what money Google pays.

Some of us are actually in it to scratch an intellectual itch, not for the money. I gave up a lucrative career as a software dev to study the subject I'm really interested in and took a huge pay cut for the privilege of working in the only environment that offers anything near the freedom to research whatever satisfies the little gray cells. The OP's turn of phrase about gatekepping sounds like a bitter joke to me. My field gets maybe one or two new entrants a year, folks like me who don't know what's good for them and can't tame their intellectual curiosity enough to get a real job. Most PhD students, most researchers of any level, don't want to gatekeep anything, we want to tell the world about our work. But most people don't give a shit, unless Google is interested. Unless you can grab my research and be an instant millionaire nobody cares.

That will be more true as ML gets more reliable, but we aren’t quite there yet. As ML gets more useful, people are going to use it more. As people use it more, people served by it will be more frustrated by unreliable/uncontrollable aspects of it.

A bug species detector that works almost all the time will get commoditized. A bug species detector that fails gracefully won’t get commoditized until later.

An insurance risk predictor will get commoditized. An insurance risk predictor that you’re confident is fair and unbiased won’t get commoditized until later.

A chat bot that gives you customers useful information about your topic will be commoditized. A chat bot that also never will say anything to tarnish your brand (e.g. sexist/racist/wrong) won’t be commoditized until later.

A last example illustrates it perfectly. The majority of automating driving was solved pretty quickly. That last mile of reliability has taken forever.

I’m convinced that knowing what’s going on under the hood will still be valuable for a while yet, because we are just not starting to really face ML failure modes at large scale.

...why do you think it's easier for people to apply this stuff? Who do you think builds and maintains tools like umap-learn, sklearn etc.?
> Who do you think builds and maintains tools like umap-learn, sklearn etc.?

A smaller set of people than those using them.

In a previous job my team had to do a lot of ML work with a team full of gatekeepers. I was surprised how easy it was to understand the work even though we were ML newbies - we even implemented some simpler ML algorithms ourselves (long story) which wasn't too hard.

When talking to the PhD boss of the gatekeeper group, he told me that in practice a lot of practical ML work is done with a handful of algorithms and most of what he learned on his PhD is not useful in the industry. And a lot of complexity now sits behind tools (what TFA says).

He was suggesting that, in most applications, a ML expert could be useful as a consultant for a brief period early in the project, but 1) the amateurs wouldn't do too bad by picking algorithms themselves and using the tools for tuning after some study of the fundamentals and 2) in case an expert is consulted, the rest of the team can just run with their suggestions for 90%+ of the duration of the project.

I think there's a category error here. The PhD and the tinkerer are not doing the same thing when engaged in the same problem.
AI is more art than science, but still requires experience. We all know the same happens with SW engineering in general, otherwise we all know how spectacular failures can be, regardless of how much domain knowledge do you have.

Unsurprisingly extraordinary results in AI require extraordinary commitment. It may or may not make sense from a business perspective.