Hacker News new | ask | show | jobs
by paulfr 3924 days ago
I'll pile on the bandwagon.

I just downloaded the dataset, and color is such a powerful feature that training a random forest on images downsampled to a single pixel results in 95% and 98% accuracies! (for the 4-category and 6-category versions, respectively)

And you can easily exceed 99.5% by adding more features to the forest, which is far above their DBN accuracy.

I have no idea how they were able to get an accuracy as low as 69% when they evaluated random forests.

2 comments

I read the paper, and I also have some reservations. The procedure they used to extract and randomize their data seems biased towards large homogenous areas.

In short, in their procedure, it seems possible to rope off a large contiguous area of Mojave desert, ground-truth it using their GUI system as "barren", and have that area be carved up into 28x28 pixel chips and spread equally into the training and test sets.

In such a case, the training and test sets are not really independent. And their 6 classes, as you point out, are amenable to color features.

Having done classification of remote sensing data...the above is not a good test of accuracy at any useful task. You have to test accuracy on representative data.

That means training within a few areas, and testing on geographically distant but ecologically similar areas. (I.e., same class, but statistically independent.). And, varying things like time of day, observing geometry, and seasonality. Color features will be quite fragile in such tests.

And, testing on a more diverse sample, to see if "none of the above" can be detected, because their class decomposition is nothing like exhaustive.

Hah that's shocking! You should contact them. Seems like an inexperienced team then.