Hacker News new | ask | show | jobs
by epberry 3020 days ago
Very cool. I shudder to think of the GPU costs to run these models though. Perhaps they're using TPUs to be as efficient as possible. If you imagine a room occupied in the evening by people for several hours and you have any decent framerate, you're running your pose estimation network on each frame for several hours. And these models are big as far as I have seen. So that pretty much means you have one cloud GPU per camera allocated every evening. I suppose another option is they are running pieces of the network on the device but I think that's unlikely.

Of course these models are getting smaller over time and I'm incredibly impressed that these guys have put together the hardware, computer vision, and cloud setup. I also think they've nailed the MVP - not too easy but not too complicated either assuming they have decent models.

I'm signing up!

1 comments

I think you're overestimating the processing requirements. The original 2010 Kinect did fundamentally similar processing (multi-person tracking and skeletal mapping) on the Xbox 360 which had a PowerPC CPU from 2005.
Running neural networks is usually much easier than training them (computationally).

Was the Kinect even a neural network? I don't think it was.