Hacker News new | ask | show | jobs
by DanielRapp 5267 days ago
Thanks. I did do a simple analysis[1] and changed it[2] to 5 neighbors. Though when I look at the graph now, I see that 4 is actually the optimal value..

Swedish graph (täckning = recall): http://cl.ly/BJRa/pr.png

[1] https://github.com/DanielRapp/twss.js/blob/master/lib/analyz...

[2] https://github.com/DanielRapp/twss.js/commit/3cfcda785583084...

1 comments

Why don't you try 10-fold CV (http://en.wikipedia.org/wiki/Cross-validation_%28statistics%...) - the graph might drastically change. Here is example how to do it: https://onlinecourses.science.psu.edu/stat857/book/export/ht...

If precision & recall monotonically go down when increasing NN then it means you don't have enough training data.