| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gcmac 3044 days ago
	Very informative and well written article about KNN classification. However, as a data scientist it always pains me to see the iris data set being used. It is linearly separable and gives no indication of whether or not you actually want to use the given methodology on your problem since almost every technique can achieve these results on this data. I'd recommend using something from kaggle or even the UCI repository to make these types of articles even more useful!

1 comments

ScottWRobinson 3044 days ago

Author here. You have a very good point. I tend to default to using Iris for articles like these because of its simplicity, ease of setup, etc., but you're right that it isn't as informative in showing readers the algorithm's capability. I'll have to try out some different datasets for upcoming articles :)

link

workhn 3044 days ago

Recommend the wine data set or the PIMA Indian diabetes dataset.

link