Hacker News new | ask | show | jobs
by gone35 4127 days ago
Abraham,

This is the kind of work that is essential for our future --so thank you. I'm quite (positively) surprised Sam decided to fund this; we definitely need to put more effort and resources as a civilization to work like this.

I'll hopefully have a lot more to say in the future, and will definitely be reaching out in a more substantive way... But in the meantime, a quick word of advice: This will sound strange, since Hinton's work is quite powerful as it is, but my guess is that good-old boosting / \ell_1-regularized ensemble learning methods would work much better for this particular problem domain --so please run some experiments and look into it, if you haven't already. It's hard to find good and up-to-date literature on this (nowadays) less fashionable work (a good rule of thumb: if it mentions 'random forests', it is not well-informed enough), but Freund and Schapire's recent book [1] is self-contained and a jewel to read back-to-back. Best of luck.

[1] http://mitpress.mit.edu/books/boosting

1 comments

Thank you! Personally, I find it very exciting to be working on these problems.

With respect to boosting, we have more investigation to do, of course; the tricky issue with the biological domain is that we know the underlying data is incredibly noisy. How to walk the line of extracting maximum predictive performance without overfitting is the challenge, since we know that a lot of the raw data points are unreliable. Any algorithm we use has to be able to handle this scenario deftly.

Absolutely. There has been some work specifically on boosting in the presence of noise --see for instance [1], and Sec. 12.3.3 of Schapire's book-- using branching programs/BDDs as base learners. It's definitely worth taking a look.

[1] http://research.microsoft.com/en-us/um/people/adum/publicati...