Hacker News new | ask | show | jobs
by nsthorat 3181 days ago
deeplearn.js author here...

We do not send any webcam / audio data back to a server, all of the computation is totally client side. The storage API requests are just downloading weights of a pretrained model.

We're thinking about releasing a blog post explaining the technical details of this project, would people be interested?

5 comments

Yes please! :)

And some quick questions:

What network topology do you use, and on what model is it based (e.g. "inception")?

What kind of data have you used to pretrain the model?

We're using SqueezeNet (https://github.com/DeepScale/SqueezeNet), which is similar to Inception (trained on the same ImageNet dataset) but is much smaller - 5MB instead of inception's 100MB - and inference is much much quicker.

The application takes webcam frames and infers through SqueezeNet, producing a 1000D logits vector for each frame. These can be thought of as unnormalized probabilities for each of ImageNet's 1000 classes.

During the collection phase, we collect these vectors for each class in browser memory, and during inference we pass the frame through SqueezeNet and do k-nearest neighbors to find the class with the most similar logits vector. KNN is quick because we vectorize it as one large matrix multiplication.

I'll go deeper in a blog post soon :)

So you're doing nearest neighbour search on the images features from the CNN. This is alluded to in Figure 4 of the DeCaf paper: https://twitter.com/eggie5/status/907120374575505408
alexnet paper not decaf paper!
Interesting!

I'm curious why you've used a different classification algorithm on top of a neural network. I would expect that a neural network on top of a pretrained network could give similar results, with the benefit of simpler code. Is performance the reason?

Anyway, I'm looking forward to your blog post.

Training a neural network on top would require a "proper" training phase, and finding the right hyperparameters that work everywhere turned out to be tricky. Actually, this is what we did originally, in the blog post we'll try to show demos of each of the approaches and explain why they don't work.

KNN also makes training "instant", and the code much much simpler.

That makes sense.

By the way, I think your software could become very popular on the Raspberry Pi, because it would be very cheap and fun to use it for all sorts of applications (e.g. home automation).

There's something fantastically entertaining about this. It's stupidly simple (from the outside) but interacting with the computer in such a different way is weirdly fun.

It's like when you turn on a camera and people can see themselves on a TV. A lot of people can't help but make faces at it.

Why does it not work in Edge? Please keep the web open, do not make stuff that does not work in a modern browser. Also always give an option to try it anyway.
A blog post on the technical details would be great, please. Thanks in advance, since I know it'll take a bit of your time to write.
To answer the question at last, yes, I am interested.