Hacker News new | ask | show | jobs
by matchagaucho 2994 days ago
Are there any ML APIs or web services that accept a vector and run various regression scenarios to identify optimal fit?

I suppose vectors for both training and testing would be required.

Would gladly pay $1-$5 per batch for a service to do this.

7 comments

I have a magic regression aggregator that works like this:

1) Take a dataset and split into training and test

2) Using the training set: run a bunch of different regressors (for a training-training subset) and get predictions (for the remaining test-training subset)

3) Run a higher-level regression against test-training subset predictions. I use either plain linear regression (so my meta-regressor is a linear combination of the regressors) or K-nearest neighbors (so the best regressor for each region of feature space is chosen).

4) If there are hyperparameters, optimize against the test set (not the test-training subset).

It's not available as an API. I'm available for consulting though.

There is a Python library called TPOT that does this.

https://github.com/EpistasisLab/tpot

I think that's DataRobot's business model, although I think they run more sophisticated models as well. It was 5+ years ago that I spoke to them but IIRC they were able to compete pretty well in Kaggle competitions with a fairly hands-off algorithm.
Could you perhaps point me to a Kaggle competition where they perform well with a hands-off approach?
I'm afraid I can't. Take it with a grain of salt, I only mention it because it was the anecdote that stood out in my memory for 5+ years :)
I'm working on an MLaaS service now and I'd love to add that feature. That said, I'd like to learn a little more about exactly how you envision the use case working. To that end, if you can spare some time to chat sometime, would you drop me a line (prhodes@fogbeam.com)?
Why not just use regression trees, eg xgboost? The parameters aren't going to mean anything anyway.
Why wouldn't you just run weka or something locally?
I just want to provide the data and let a service decide the best algorithm.

Weka and various ML tools require you select the algorithm and do the A/B testing on your own.

There's an opportunity for an Optimizely of ML.

Model fishing is bad.

You will find a model that looks good on your data. It will not be the model you should use.

"If all you have is a hammer, everything looks like a nail"

The model I've consistently chosen (Decision Trees) may not be the best model. Need to get pushed outside my comfort zone.

I could put in the months/years like a proper Data Scientist and optimize the model. Or let a magic API tell me the best model. I'm lazy, so I prefer the latter ;-)

That's what 90% of data scientists do