| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by timanglade 3323 days ago

Hey so I actually tried Vgg, Inception and SqueezeNet, out of the box, chopped and trained from scratch (SqueezeNet only for the latter due to resource constraints).

We ended up with a custom architecture trained from scratch due to runtime constraints more so than accuracy reasons (the inference runs on phones, so we have to be efficient with CPU + memory), but that model also ended up being the most accurate model we could build in the time we had. (With more time/resources I have no doubt I could have achieved better accuracy with a heavier model!)

Training the final model took about 80 hours on a single Nvidia GTX 980 Ti (the best thing I could hook to my MacBook Pro at the time). That's for 240 epochs (150k images in an epoch) ran in 3 rate annealing phases, each phase being a handful of CLR (cyclical learning rate) phases.

I'll answer in more detail in the full blogpost, it's a bit complicated to explain in a comment. I'll have charts & figures for y'all :)

3 comments

timbutlerau 3323 days ago

As someone currently writing an app which uses a retrained Inception model, I watched the show pointing and laughing (and then crying) at the same issues and frustrations. The accuracy of the show and especially this episode has been just brilliant.

Thanks for sharing all the tech details too, it's been great to read. I'm even more amazed to see it as a real app, that I didn't expect!

link

luv2lrn 3323 days ago

I'm confused about the 80 hours to train. Wasn't this episode shown on HBO less than 48 hrs ago?

Edit: Just read your bio. Now it makes sense!

link

DonHopkins 3323 days ago

He's a guest lecturer at Stanford, and he had the students help him.

link

alexcnwy 3323 days ago

great, thanks for your reply and looking forward to that blogpost!

link