| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bpires 3725 days ago

I don't think YOLO [0], the object detector he talked about, requires a massive amount of data as he claimed. Yes, if you want to learn how to classify 1000 different categories like on ImageNet, then yes, you need a lot of data. But if you're taking a pretrained network like YOLO (it was pretrained on ImageNet and trained on Pascal), you don't need a lot of images. I've retrained it with the KITTI dataset [1] and had no issues at all. They're only 7k images. By the way KITTI actually has a vehicles dataset that might be helpful for your case. And also by the way, you don't even need to retrain YOLO with your vehicle dataset. It was trained on Pascal VOC [2], a dataset of 20 categories and one of the categories is car. So YOLO already knows how to detect cars, it just might not be ideal for your dataset, but you don't care anyways since you just want any solution to compare to as a baseline. This would probably have been even less work than training the cascade classifier you used and have achieved better results.

[0]: http://pjreddie.com/darknet/yolo/

[1]: http://www.cvlibs.net/datasets/kitti/

[2]: http://host.robots.ox.ac.uk/pascal/VOC/

1 comments

romaniv 3725 days ago

> They're only 7k images.

I feel like all those deep learning papers distorted people's perception of scale. If you need to take those 7k images by hand because your application domain is obscure and they aren't available in an existing dataset, that's way beyond feasible.

link

WiseWeasel 3725 days ago

You could generate 7k images at a resolution of 4096x2160 pixels by walking around the vehicle for just under four minutes while shooting 4k video at 30 FPS, something of which modern phones are capable.

link

romaniv 3725 days ago

Yes, but how different would those 7K frames would really be? Same lighting, same background, same surrounding objects, the exact same condition of the vehicle's interior and exterior, same quirks of the camera's color profile, etc, etc. It would be an interesting experiment to actually try this, but I have a feeling the results wouldn't be all that good. Point being, you probably wouldn't get most of the benefits of deep learning and you might as well use the same approach the author used.

link

WiseWeasel 3725 days ago

As you walk around the vehicle, and change the angle from low to high, the lighting and background should have a good variance.

link

fredophile 3725 days ago

No they won't. All of the pictures will have lighting from the time of day and weather conditions from the time and place the pictures were taken. The same problems will happen for the background. If I want my neural network to identify the make and model of cars, but every picture I have of a Mazda3 is taken at noon on a sunny day in suburbia then it is reasonably likely to train on the wrong features and either identify trucks on sunny days in suburbia as Mazda3's or not recognize a Mazda3 photographed on a rainy night.

link

WiseWeasel 3725 days ago

A human might have difficulty recognizing a Mazda 3 on a rainy night as well. You can adjust color temperature and white balance in post-processing, or film a couple minutes at night too. Point is, generating 7k images is not insurmountable, especially in this case with the criteria that it only has to recognize a particular car.

link