| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tartaglia 1162 days ago
	As of a couple of years ago the main sentiment was: deep learning is neat but your problem could be solved by statistical learning techniques so check those out first. Does this still hold up?

8 comments

qsort 1162 days ago

90% of the problems can be solved by not fucking up your data, modeling your tables correctly, and a SQL query.

Of the 10% that can't be solved that way, 90% are solved with data cleaning + a linear model.

Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

For the remaining .1%, sure, deep learning I guess.

link

ogogmad 1162 days ago

You keep making these very negative posts, where you sound very confident, but resort to either word salad or making things up.

> not fucking up your data, modeling your tables correctly, and a SQL query

And getting a pony.

> Of the 10% that can't be solved that way, 90% are solved with data cleaning + a linear model.

87% of statistics are made up.

> Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

But why?

This reminds me of how some people in the early '80s sneered at people who did their calculations using computers - recommending instead to memorise a billion mathematical shortcuts that would take longer to learn than programming a computer.

link

qsort 1162 days ago

Why do you think this post is "negative"? "Negative" towards what?

> 87% of statistics are made up.

Yes, of course, I didn't mean "lower integer part of nine tenths of the total number of problems". Did that really need to be specified?

>> Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

> But why?

Is it really controversial that you should go for the simplest model that works?

link

wanderingmind 1161 days ago

For structured data this is valid, but the power of deep learning is for unstructured data where the embeddings and features need to be learned from raw data

link

sebzim4500 1161 days ago

90% of what problems?

I guarantee that 90% of the things I want to do have nothing to do with a table lookup.

link

MichaelZuo 1161 days ago

And what of the 0.01% that deep learning is unable to solve?

link

epups 1161 days ago

Really? Can SQL queries do image recognition? How about self driving or, more recently, natural language processing?

link

anaganisk 1160 days ago

Select * from images where item='car'

Select * from images where color='red' and item='light' and tag='traffic'.

Select * from voice where token='brrrrr'

link

RamblingCTO 1162 days ago

Depends on the task. NLP? Audio? Video? DL is probably best. Classification, regression etc.? Don't bother (in my experience). You can still utilize deep learning-esque tools like embeddings (which aren't deep learning at all) and put an SVM on top.

link

roenxi 1162 days ago

That will almost always be the case. If basic stats could solve a problem then they still can.

All this AI development may open up new business opportunities or close existing ones - but for existing businesses, assuming they survive the disruption caused by AI, they probably still would be best served by focusing on getting some basic stats used in their processes. If they can't do that, then a neural network isn't going to make the situation any better for them.

link

nextos 1162 days ago

Lots of organizations that are a bit clueless and trying to catch up quickly, driven by hype, think they need deep learning when they actually need Bayesian learning.

Instead of looking at stuff from Murphy volume I, they should look at volume II or Gelman's books.

There are neat combinations of ideas from both fields. Aside from volume II, Pyro's documentation provides some interesting use cases.

link

PartiallyTyped 1162 days ago

If you dont have a lot of data, then statistical learning. If your data is structured and have well defined interpretations, then statistical learning.

So if you have lots of data and no reasonable way to extract information or process them, you go to DL stuff.

You can get a lot of mileage out of gradient boosted trees and other forms of ensembles.

link

BenoitP 1162 days ago

Small data is better handled by greedy methods. XGBoost is still the go to in tabular Kaggle challenges.

link

gallabytes 1162 days ago

lmao no. imagine trying to do text to image with anything other than deep learning. nothing else comes close.

link

KeplerBoy 1162 days ago

We have artists for that. I heard they can compete with sota methods.

link

gallabytes 1161 days ago

OP was about classical statistical techniques. I'm pretty sure human artists are not logistic regression?

link

sebzim4500 1161 days ago

Compete on quality, certainly not on price or speed.

link

hardha87 1162 days ago

Ofcourse it does. Kings paper “probabilistically statistics” and his method of folding squares is the motivation behind this original advice

link

sdeep27 1161 days ago

Link?

link