|
It generally means you teach it to recognize 3d shapes (a 2d image that moves = a 3d image, more or less. Yes there's a good reason why you might want to call it 2.5d, but the easy way to model a 2.5d object is in 3d). Think of it as the difference between recognizing 2 points and recognizing a Feynman diagram. This is one of the things people don't often realize you can do with algorithms. You don't need to look at the world the way it actually really exists, and there may be very good reasons not to. Training algorithms to actually recognize moving images is incredibly hard, because it requires things like memory, fade-outs, recurrent networks, all that very advanced stuff. Obviously time exists as a continuum in the "real" world. But that's bloody inconvenient. So just look at big "quanta" of time, collecting all data points during the quantum, analyse it, then shift the quanta/window ahead 0.1s and do the exercise again. This is so much easier you wouldn't believe it. Teaching an algorithm to recognize, say, a car collision, given 100 frames. It doesn't require any change to the algorithm (just a change in training data). And obviously your backend system needs to be aware that, over time, the "isColliding" output will look like ......1.....11.....1111...1111.1.1.111.11..11...11.11...11...1.....1...1...... when a collision occurs and this of course doesn't mean you've had 20 collisions. It does mean a bigger network, slower training, and more resources needed. But not as much as you'd think. Keep in mind that a "temporal" network will need more hidden layers. Also please consider building "redundant" networks for temporal data. When people ask why, I have no better answer than that it's the same technique our brain uses, so frankly if it's good enough for God, it's good enough for me. Doing the temporal thing means you're back to using trivially simple algorithms, running on more data. Cracking captcha's is not very impressive. I've done it as a weekend project, and exceeding "average" human captcha'ing ability is easy. I actually got it to the point where my algorithm was slightly better at captcha's than me, where I was allowed to take 2 minutes for difficult captchas. If I wasn't allowed to take more than 10 seconds, my algorithm easily beat me by over 10% (my captcha performance, when measured, shockingly is only ~83%). I didn't cheat : I used an external site's captchas (from dns.be). The algorithm used was dead simple backpropagation. |
If it's so easy, could you share it on github?