| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by claytonjy 2574 days ago

this can be tricky, because it varies so much by domain. I imagine you have a good handle on the domain, so you can hopefully do a good job defining reasonable noise on each measure.

You can also try more generic upsampling techniques, like SMOTE, which should be easy from python or R. It's never actually helped me, but I assume it's useful somewhere.

I suspect at some point you're going to need to take an axe to some of your inputs, preferably based on human priors rather than a sketchy feature-selection process.

SVM's are great, but once you get past linear boundaries there's enough tuning complexity that I'd rather use that effort tuning a GBM. That's largely because of tooling though; I know there are modern SVM libs, but I haven't used them. Definitely try a random forest if you haven't!

1 comments

SubiculumCode 2573 days ago

Thanks for your input.

link