Hacker News new | ask | show | jobs
by gcmac 3044 days ago
Very informative and well written article about KNN classification. However, as a data scientist it always pains me to see the iris data set being used. It is linearly separable and gives no indication of whether or not you actually want to use the given methodology on your problem since almost every technique can achieve these results on this data. I'd recommend using something from kaggle or even the UCI repository to make these types of articles even more useful!
1 comments

Author here. You have a very good point. I tend to default to using Iris for articles like these because of its simplicity, ease of setup, etc., but you're right that it isn't as informative in showing readers the algorithm's capability. I'll have to try out some different datasets for upcoming articles :)
+1

Recommend the wine data set or the PIMA Indian diabetes dataset.