Hacker News new | ask | show | jobs
by jherrick 4854 days ago
Good list. I would also suggested eliminating outlying items based on prices. It seems like when there are dozens of items named XXX, there will be several with "hard drive for XXX" or something.

I must believe there's any easy way to eliminate some "outliers" using mathematics, but I can't recall the function(s) to do so.

1 comments

> I must believe there's any easy way to eliminate some "outliers" using mathematics, but I can't recall the function(s) to do so.

The median is one good way, as you already have. You can also use the interquartile mean: http://en.wikipedia.org/wiki/Interquartile_mean

At the moment I'm filtering out items 2 standard deviations out of the median. It catches the ridiculous cases, i.e. when some fool tries to get away with selling an iphone for $6000 (yes I've seen this before).

Perhaps I need to filter it within 1 or 1.5 stdevs. Will experiment with this.

However, sometimes you can easily see there are two clusters of results. Not sure how to mathematically determine this. Any ideas?