Hacker News new | ask | show | jobs
by kcarnold 4254 days ago
It's cool that a team of deep learning researchers can pull this off quickly. Anyone know of an "image2vec" (word2vec for images) that would empower others to try out similar things? (unfortunately it would need a better name, because "vectorize" means something different for images.)
3 comments

Caffe (and other frameworks) provide exactly this. It's basically:

1) To setup, load a pre-trained AlexNet/Overfeat/other architecture model (e.g. trained on ILSVRC2012)

2) To get a vector from an image, run a forward pass on the images, and extract the activations at a given layer (e.g. fc7) as the output vector.

http://caffe.berkeleyvision.org/gathered/examples/feature_ex... is a step-by-step walkthrough.

There's a lot of mystique around deep learning and these kind of problems, but it's not _that_ difficult to use these tools.

I think you'd need something slightly more complex than a "word2vec" since images already have a well defined "word vector" i.e. a pixel. What you want is a "parser" that can take in an image and spit out the significant parts of it? Stanford might have the code up from this paper ( http://machinelearning.wustl.edu/mlpapers/paper_files/ICML20...) up on their site.
Do you just want the neural net, or the tags too? This might not be what you want, but it looks like it could be fun to play with: http://clarifai.com/