Hacker News new | ask | show | jobs
by wongarsu 522 days ago
Interesting approach to a a very interesting challenge, given how close the images supposedly are.

With the limited training data they have I'm surprised they don't mention any attempts at synthetic training data. Make (or buy) a couple museum scenes in blender, hang one of the images there, take images from a lot of angles, repeat for more scenes, lighting conditions and all 350 images. Should be easy to script. Then train YOLO on those images, or if that still fails use their embedding approach with those training images.

1 comments

They did.

> “ To address this limitation, we turned to data augmentation, artificially creating new versions of each image by modifying colors, adding noise, applying distortion, or rotating images. By the end, we had generated 600 augmented images per car.”

Those are pretty standard. A standard YOLO training run applies more transformations than that, and there are ready-made modules that do the same in keras and pytorch (for their mobilenet and VGG16). I'm not sure if anyone is training any serious vision algorithm without that kind of data augmentation.

What I am talking about is that they want to recognize scenes containing the images, but only have the images as training data. They have a good idea what those scenes will look like. Going there to take actual training pictures was evidently not viable, but generating approximations of them might have been.