Hacker News new | ask | show | jobs
by goberoi 3245 days ago
I tried out the top 5 computer vision API vendors last year. See my findings, along with examples of output from all of them here: https://goberoi.com/comparing-the-top-five-computer-vision-a...

At the time, Clarifai was the "best" one (I caveat by quoting because this was a for a small corpus, with subjective results, not a real train-test cycle). I re-ran the results about a month ago (linked to from the post), and found that Google and others have continued to invest and improve.

2 comments

Great overview. Clarifai is certainly extremely impressive.

Do you play the tablas? My wife and I studied sitar but our instrument was destroyed by the movers in our latest relocation to Shenzhen :( The tabla teacher where we studied was able to play a very complex taal while chewing betel nut and rolling his eyes back in their sockets, immediately switch to a pitch, bend and time-perfect rendition of 'pink panther' melody, then switch back to a very complex taal without skipping a beat. Brilliant to see.

Yeah, Clarifai did well. I am keen to learn how well their custom model feature works. Per their FAQ[0], you only need supply 20-50 images per concept. That seems remarkable to me, given that a concept like 'cow' has ~1500 images on Imagenet[1]. Perhaps they are using some sort of transfer learning to facilitate this? I.e. using a pretrained model, and then only retraining the last few fully connected layers, or retraining parts of the entire network?

I am not a deep learning practitioner, but would be curious to know from experts how their custom model feature might work; and from any of their users on how well it actually does.

Tablas: haha, great description of your teacher. I do play, with enthusiasm, but poorly. For those in Seattle, there is an amazing teacher who teaches up on Cap Hill [2].

[0] http://help.clarifai.com/custom-training/custom-training-faq

[1] http://image-net.org/synset?wnid=n01887787

[2] http://www.acitseattle.org

It is not necessary to train things from scratch; you take the largest imagenet model available and fine tune it for the task. This way it reuses much of the lower layers the have seen lots of data.
Can you share any thoughts on which would be best for a computer vision newbie and programming novice to get started playing with? Or are none of them really great for that?
These aren't really great for learning about computer vision or deep learning, but are great for building projects that require image classification.

E.g., recall the awesome project for lego sorting by jacquesm[0] ? He built his own model using Keras and Tensorflow, but you may be able to achieve similar results by using Clarifai's feature to train your own models with no understanding of deep learning. This is great if your goal is to build a thing, like a lego sorter, but not so much if you want to learn how to build a state of the art image classifier.

If you're interested in learning about computer vision or deep learning, I recommend searching this site to find threads that cover that extensively. Good luck!

[0] https://jacquesmattheij.com/sorting-lego-the-software-side