Hacker News new | ask | show | jobs
by benanne 3926 days ago
I have some doubts about this. Deep learning moves fast and DBNs are pretty much outdated models, even for unsupervised pre-training. It doesn't make much sense to me that unsupervised pre-training would help for this problem to begin with, seeing as their dataset totals around 65TB.

The paper is worth checking out: http://arxiv.org/abs/1509.03602 I haven't read it in full, but based on a quick skim, the convnet architectures they evaluated seem laughably tiny and shallow (at most three convolutional layers) by today's standards -- although I appreciate that there may be other constraints at play here (limits on training time etc.).

But to claim that DBNs are better suited for this problem than convnets based on these results is quite far-fetched. I'm confident that a convnet could crush these results, given enough effort and time spent on hyperparameter tuning.

I find this part particularly misleading (section 6, page 13): "shape/edge based features which are predominantly learned by various Deep architectures are not very useful in learning data representations for satellite imagery. This explains the fact why traditional Deep architectures are not able to converge to the global optima even for reasonably large as well as Deep architectures."

The whole point of learning features is so that they are better suited for the task at hand. If "shape/edge based features" are not suitable to perform a particular task, then a properly trained convnet should not learn them. I think the conclusions drawn from this work would have been very different if the chosen network architectures were more sensible.

1 comments

+1. There are several fishy statements throughout this paper. Another one in conclusion:

"For satellite datasets, with inherently high variability, traditional deep learning approaches are unable to converge to a global optima even with significantly big and deep architectures."

this quote points to some basic misunderstandings of how/when these models work. "Inherent high variability" is suddenly some kind of a problem? Unable to converge to a global optima? The modern view of the deep net optimization landscapes based on several recent studies argue against these outdated interpretations.

I'll pile on the bandwagon.

I just downloaded the dataset, and color is such a powerful feature that training a random forest on images downsampled to a single pixel results in 95% and 98% accuracies! (for the 4-category and 6-category versions, respectively)

And you can easily exceed 99.5% by adding more features to the forest, which is far above their DBN accuracy.

I have no idea how they were able to get an accuracy as low as 69% when they evaluated random forests.

I read the paper, and I also have some reservations. The procedure they used to extract and randomize their data seems biased towards large homogenous areas.

In short, in their procedure, it seems possible to rope off a large contiguous area of Mojave desert, ground-truth it using their GUI system as "barren", and have that area be carved up into 28x28 pixel chips and spread equally into the training and test sets.

In such a case, the training and test sets are not really independent. And their 6 classes, as you point out, are amenable to color features.

Having done classification of remote sensing data...the above is not a good test of accuracy at any useful task. You have to test accuracy on representative data.

That means training within a few areas, and testing on geographically distant but ecologically similar areas. (I.e., same class, but statistically independent.). And, varying things like time of day, observing geometry, and seasonality. Color features will be quite fragile in such tests.

And, testing on a more diverse sample, to see if "none of the above" can be detected, because their class decomposition is nothing like exhaustive.

Hah that's shocking! You should contact them. Seems like an inexperienced team then.
Since you seem fond of Deep Learning projects, what do you think something like automatic classification of streets (and transportation network in general) from imagery is viable yet? Seems like it would be useful for OpenStreetMap. The corpus of valid classification is tremendous (pretty much all (>95%?) of NA is classified , and the data is readily available.

The subjects themselves don't seem too complex either: lines are small roads, thick lines are major ones, and then there's intersections which semantically interlink them.

I was wondering about this also, especially for the case of Humanitarian OpenStreetMap where they map e.g. West Africa and allow you to map without visiting the area (normally not allowed on OSM). The maps can so sparse before we map a region, that any 'AI' would not have to be perfect - it would anyway be a vast improvement on what already exists.

Maybe a good option would be a mapping tool for humans, that traced e.g. a building and then said to the user 'I think this is a building, press Yes to accept'. That would speed up my mapping times by maybe a factor of 5, especially once I got comfortable with the AI being reliable, and could click Yes after just a cursory check.

Right, human assist would probably be needed for final verification and unfortunately it's impossible to correctly name the streets (unless everywhere were like Manhattan); number might be doable.

It just seems like a perfect fit for Deep Convolutional neural nets.