| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nsthorat 3181 days ago

deeplearn.js author here...

We do not send any webcam / audio data back to a server, all of the computation is totally client side. The storage API requests are just downloading weights of a pretrained model.

We're thinking about releasing a blog post explaining the technical details of this project, would people be interested?

5 comments

amelius 3181 days ago

Yes please! :)

And some quick questions:

What network topology do you use, and on what model is it based (e.g. "inception")?

What kind of data have you used to pretrain the model?

link

nsthorat 3181 days ago

We're using SqueezeNet (https://github.com/DeepScale/SqueezeNet), which is similar to Inception (trained on the same ImageNet dataset) but is much smaller - 5MB instead of inception's 100MB - and inference is much much quicker.

The application takes webcam frames and infers through SqueezeNet, producing a 1000D logits vector for each frame. These can be thought of as unnormalized probabilities for each of ImageNet's 1000 classes.

During the collection phase, we collect these vectors for each class in browser memory, and during inference we pass the frame through SqueezeNet and do k-nearest neighbors to find the class with the most similar logits vector. KNN is quick because we vectorize it as one large matrix multiplication.

I'll go deeper in a blog post soon :)

link

eggie5 3180 days ago

So you're doing nearest neighbour search on the images features from the CNN. This is alluded to in Figure 4 of the DeCaf paper: https://twitter.com/eggie5/status/907120374575505408

link

eggie5 3180 days ago

alexnet paper not decaf paper!

link

amelius 3181 days ago

Interesting!

I'm curious why you've used a different classification algorithm on top of a neural network. I would expect that a neural network on top of a pretrained network could give similar results, with the benefit of simpler code. Is performance the reason?

Anyway, I'm looking forward to your blog post.

link

nsthorat 3181 days ago

Training a neural network on top would require a "proper" training phase, and finding the right hyperparameters that work everywhere turned out to be tricky. Actually, this is what we did originally, in the blog post we'll try to show demos of each of the approaches and explain why they don't work.

KNN also makes training "instant", and the code much much simpler.

link

amelius 3181 days ago

That makes sense.

By the way, I think your software could become very popular on the Raspberry Pi, because it would be very cheap and fun to use it for all sorts of applications (e.g. home automation).

link

nsthorat 3181 days ago

https://github.com/PAIR-code/deeplearnjs/issues/158

link

make3 3181 days ago

Basically, read this paper: https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf

link

Splines 3181 days ago

There's something fantastically entertaining about this. It's stupidly simple (from the outside) but interacting with the computer in such a different way is weirdly fun.

It's like when you turn on a camera and people can see themselves on a TV. A lot of people can't help but make faces at it.

link

sydd 3180 days ago

Why does it not work in Edge? Please keep the web open, do not make stuff that does not work in a modern browser. Also always give an option to try it anyway.

link

haser_au 3180 days ago

A blog post on the technical details would be great, please. Thanks in advance, since I know it'll take a bit of your time to write.

link

godelmachine 3180 days ago

To answer the question at last, yes, I am interested.

link