Hacker News new | ask | show | jobs
by tartaglia 1162 days ago
As of a couple of years ago the main sentiment was: deep learning is neat but your problem could be solved by statistical learning techniques so check those out first. Does this still hold up?
8 comments

90% of the problems can be solved by not fucking up your data, modeling your tables correctly, and a SQL query.

Of the 10% that can't be solved that way, 90% are solved with data cleaning + a linear model.

Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

For the remaining .1%, sure, deep learning I guess.

You keep making these very negative posts, where you sound very confident, but resort to either word salad or making things up.

> not fucking up your data, modeling your tables correctly, and a SQL query

And getting a pony.

> Of the 10% that can't be solved that way, 90% are solved with data cleaning + a linear model.

87% of statistics are made up.

> Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

But why?

This reminds me of how some people in the early '80s sneered at people who did their calculations using computers - recommending instead to memorise a billion mathematical shortcuts that would take longer to learn than programming a computer.

Why do you think this post is "negative"? "Negative" towards what?

> 87% of statistics are made up.

Yes, of course, I didn't mean "lower integer part of nine tenths of the total number of problems". Did that really need to be specified?

>> Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

> But why?

Is it really controversial that you should go for the simplest model that works?

For structured data this is valid, but the power of deep learning is for unstructured data where the embeddings and features need to be learned from raw data
90% of what problems?

I guarantee that 90% of the things I want to do have nothing to do with a table lookup.

And what of the 0.01% that deep learning is unable to solve?
Really? Can SQL queries do image recognition? How about self driving or, more recently, natural language processing?
Select * from images where item='car'

Select * from images where color='red' and item='light' and tag='traffic'.

Select * from voice where token='brrrrr'

Depends on the task. NLP? Audio? Video? DL is probably best. Classification, regression etc.? Don't bother (in my experience). You can still utilize deep learning-esque tools like embeddings (which aren't deep learning at all) and put an SVM on top.
That will almost always be the case. If basic stats could solve a problem then they still can.

All this AI development may open up new business opportunities or close existing ones - but for existing businesses, assuming they survive the disruption caused by AI, they probably still would be best served by focusing on getting some basic stats used in their processes. If they can't do that, then a neural network isn't going to make the situation any better for them.

Lots of organizations that are a bit clueless and trying to catch up quickly, driven by hype, think they need deep learning when they actually need Bayesian learning.

Instead of looking at stuff from Murphy volume I, they should look at volume II or Gelman's books.

There are neat combinations of ideas from both fields. Aside from volume II, Pyro's documentation provides some interesting use cases.

If you dont have a lot of data, then statistical learning. If your data is structured and have well defined interpretations, then statistical learning.

So if you have lots of data and no reasonable way to extract information or process them, you go to DL stuff.

You can get a lot of mileage out of gradient boosted trees and other forms of ensembles.

Small data is better handled by greedy methods. XGBoost is still the go to in tabular Kaggle challenges.
lmao no. imagine trying to do text to image with anything other than deep learning. nothing else comes close.
We have artists for that. I heard they can compete with sota methods.
OP was about classical statistical techniques. I'm pretty sure human artists are not logistic regression?
Compete on quality, certainly not on price or speed.
Ofcourse it does. Kings paper “probabilistically statistics” and his method of folding squares is the motivation behind this original advice
Link?