Hacker News new | ask | show | jobs
by qsort 1168 days ago
90% of the problems can be solved by not fucking up your data, modeling your tables correctly, and a SQL query.

Of the 10% that can't be solved that way, 90% are solved with data cleaning + a linear model.

Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

For the remaining .1%, sure, deep learning I guess.

5 comments

You keep making these very negative posts, where you sound very confident, but resort to either word salad or making things up.

> not fucking up your data, modeling your tables correctly, and a SQL query

And getting a pony.

> Of the 10% that can't be solved that way, 90% are solved with data cleaning + a linear model.

87% of statistics are made up.

> Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

But why?

This reminds me of how some people in the early '80s sneered at people who did their calculations using computers - recommending instead to memorise a billion mathematical shortcuts that would take longer to learn than programming a computer.

Why do you think this post is "negative"? "Negative" towards what?

> 87% of statistics are made up.

Yes, of course, I didn't mean "lower integer part of nine tenths of the total number of problems". Did that really need to be specified?

>> Of the 1% that can't be solved either way, 90% are solved with other statistical techniques (timeseries modeling, decision trees and so on).

> But why?

Is it really controversial that you should go for the simplest model that works?

For structured data this is valid, but the power of deep learning is for unstructured data where the embeddings and features need to be learned from raw data
90% of what problems?

I guarantee that 90% of the things I want to do have nothing to do with a table lookup.

And what of the 0.01% that deep learning is unable to solve?
Really? Can SQL queries do image recognition? How about self driving or, more recently, natural language processing?
Select * from images where item='car'

Select * from images where color='red' and item='light' and tag='traffic'.

Select * from voice where token='brrrrr'