Hacker News new | ask | show | jobs
by murbard2 4016 days ago
The detection layer will detect very faint random signals. For example, if you have a unit that's supposed to detect dogs, it might be very faintly activated if by random chance there is a doggish quality to some part of the image. What they do is pick up that faint, random, signal and amplify it.

They say: oh you think that cloud is a tiny bit dog-like? Ok, well then find me a small modification to the image that would make it a little more dog like, then a little more, and so on.

Think of it as semantic contrast enhancement

2 comments

So in concrete terms, does this mean that we show the network an image, choose one layer's output vector, and then back-propagate gradients to the image such that the direction of that vector stays the same, but the magnitude increases?
That is my understanding of the blog post, yes.

That plus a prior on the input pixels to keep it image-like.

They've written though that they have chosen a particular layer in the network, which reads like "independent of the output layer". Features in such a layer correlate with certain classes, but I don't think they have dealt with classes at all. If that's the case, then the question is how they've amplified the detected features.
Yes they play with various layers. Layers closer to the input act more like edge enhancers, while higher layers emphasize whole objects ("animal" enhancers). You get increasingly less syntactical and increasingly more semantic as you go deeper in the network.
So "lower" layers are closer to the input, while "higher" layers are closer to the output?