|
|
|
|
|
by gone35
4127 days ago
|
|
Abraham, This is the kind of work that is essential for our future --so thank you. I'm quite (positively) surprised Sam decided to fund this; we definitely need to put more effort and resources as a civilization to work like this. I'll hopefully have a lot more to say in the future, and will definitely be reaching out in a more substantive way... But in the meantime, a quick word of advice: This will sound strange, since Hinton's work is quite powerful as it is, but my guess is that good-old boosting / \ell_1-regularized ensemble learning methods would work much better for this particular problem domain --so please run some experiments and look into it, if you haven't already. It's hard to find good and up-to-date literature on this (nowadays) less fashionable work (a good rule of thumb: if it mentions 'random forests', it is not well-informed enough), but Freund and Schapire's recent book [1] is self-contained and a jewel to read back-to-back. Best of luck. [1] http://mitpress.mit.edu/books/boosting |
|
With respect to boosting, we have more investigation to do, of course; the tricky issue with the biological domain is that we know the underlying data is incredibly noisy. How to walk the line of extracting maximum predictive performance without overfitting is the challenge, since we know that a lot of the raw data points are unreliable. Any algorithm we use has to be able to handle this scenario deftly.