Hacker News new | ask | show | jobs
by nemik 3336 days ago
For more training data, I wonder if you could make Lego parts in Sketchup or some 3D program, then render them in a 'scene' similar to your camera setup using a renderer like Maxwell or V-Ray or whatever. Then you could maybe be able to generate unlimited numbers of sample images to train on.

I'm doing a similar experiment now to train a model to parse out an image of a blood pressure monitor that's a 7-segment LCD display. To do it I separated out each segment of the display as masks with Gimp/Photoshop and then I can create my own images by just overlaying them on top of an image of a blank LCD display. That gets me basically unlimited training photos. If you could render the 3D parts from various angles, colours, etc then something similar might be possible.

Also, you said you're doing modified VGG and into 20k classes. That works, but another thing to maybe try is use binary_crossentropy as the loss function and a sigmoid (instead of softmax) on the final activation layer, to be able to do multiclass classification. Then your labels could be a vector of shape possibilities, colour possibilities, or whatever you could divide your 20k classes into.

1 comments

I've tried the rendering trick but it didn't work well enough, the real pictures seem to give much better results when used on unseen data.

> Also, you said you're doing modified VGG and into 20k classes. That works,

Right now there are 1002 classes, the 1000 most common lego parts, 'mess' and 'other'.

> but another thing to maybe try is use binary_crossentropy as the loss function and a sigmoid (instead of softmax) on the final activation layer, to be able to do multiclass classification. Then your labels could be a vector of shape possibilities, colour possibilities, or whatever you could divide your 20k classes into.

Ok, I can try that. Thank you!

Tagging/multi-label classification is useful because it'll help tame your explosion of classification if you want to expand. For example, it can then handle stuck-together parts by tagging it as both parts rather than putting them into a generic 'other' classification, or you could include separate tags for colors or fakeness or damagedness, avoiding the need for 100,000 categories of 'fake damaged red square brick' etc. It might also improve learning since it's a more natural way of describing the data.