Hacker News new | ask | show | jobs
by dn5 4284 days ago
Thanks for sharing your experience! Couple of questions

Why implement the training in NodeJS and not use an existing library in R or Python (scikit-learn) and just implement the scoring (feedforward network) in Node?

Did you just use a single test/train split? What is the variation in Res if you run cross validation?

Your article suggests that you used MI to select the 10k best features. Did you perform this MI feature selection before your test/train split? If so, you would already be "using" your class labels, and the results will be biased. It is likely your true generalisation error will be lower.

1 comments

> Why implement the training in NodeJS and not use an existing library in R or Python (scikit-learn) and just implement the scoring (feedforward network) in Node?

We wanted to contribute to the nodeJS ecosystem and build whatever tool was missing to use neural network directly from NodeJS or at least as an add-on. We also wanted to come up with a simple an straightforward implementation to serve as an educational example rather than just bind into an existing library (even though the results might have been better of course)

> Did you just use a single test/train split? What is the variation in Res if you run cross validation?

We didn't use cross-validation but rather simple train/test split (though our test set was quite large ~100k / 570k). As explained in the intro we wanted to stay very practical and were ok with dirty shortcuts as long as the result looked OK.

> Did you perform this MI feature selection before your test/train split? If so, you would already be "using" your class labels, and the results will be biased. It is likely your true generalisation error will be lower.

Yes MI selection was made on the overall data set before training. You totally are right that this is a bias against the test set. Nice catch.