Hacker News new | ask | show | jobs
by zeratul 5269 days ago
DanielRapp: in file twss.js/lib/classifier/knn.js, number of NN should be odd to prevent ties [EDIT: also, NN should be large enough to prevent over-fitting; small NN would mean that the difference (decision boundary) between twss and not-twss is highly non-linear; you need to implement cross-validation to find best NN]

Note to self: machine learning using node.js; what's the speed of calculations, what's the memory management in node.js, can I find pure JS implementation of SVM?

1 comments

Thanks. I did do a simple analysis[1] and changed it[2] to 5 neighbors. Though when I look at the graph now, I see that 4 is actually the optimal value..

Swedish graph (täckning = recall): http://cl.ly/BJRa/pr.png

[1] https://github.com/DanielRapp/twss.js/blob/master/lib/analyz...

[2] https://github.com/DanielRapp/twss.js/commit/3cfcda785583084...

Why don't you try 10-fold CV (http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...) - the graph might drastically change. Here is example how to do it: https://onlinecourses.science.psu.edu/stat857/book/export/ht...

If precision & recall monotonically go down when increasing NN then it means you don't have enough training data.