Hacker News new | ask | show | jobs
by kctess5 3615 days ago
Hi there. I would like to turn your attention to this excellent paper on hand gesture recognition [1]. I ran across it in my foray into hand tracking and gesture recognition with the Kinect. I implemented the Fourier descriptors described in that paper (there's even source code in the paper!) and agree that they are quite effective.

Since you already have a decent looking hand segmentation method, you could simply trace the outline of the segmentation (opencv will give you contours if you ask politely) and generate descriptor vectors with that. Fourier descriptors are rotationally invariant, and you can get scale invariance easily by scaling your hand images to a constant size after segmentation. You can use the descriptors with a variety of ML algorithms, but K nearest neighbors is probably the easiest (and it was I implemented). SVM is probably also a good method. Using ML has the absolutely gigantic upside that you don't have to write code to recognize individual gestures, and it can learn many gestures (I did 5+ no problem).

I am actually working on open sourcing my code (didn't release it immediately because messy code + busy life during the semester), and porting it to Python so that I can use Numpy and Scipy. The original version I wrote in C++, which includes hand tracking (via OpenNI) + static gesture recognition on the Kinect 360 sensor, as well as a CLI for interfacing with the code and building a gesture library. To start tracking, you have to wave vigorously at the camera. In the Python version for the Kinect One sensor, I'm cleaning things up and also implementing my own hand detection and tracking algorithm (based on an unscented Kalman filter) so that I can kill the dependency on OpenNI, which will help calm the vigor of the wave gesture required to initialize OpenNI tracking. I expect good results when the Python version is finished, due to the high quality depth based hand segmentations I am seeing with the new Kinect. I might be able to hook you up with some source code if you don't mind seeing the rough draft and having a tricky install process.

While the older Kinect works with the Fourier method, it is not as good as the Kinect One sensor because it uses a depth reconstruction algorithm that results in jagged edges and merged fingers. The new sensor uses a time of flight sensor which gives very nice accuracy and normal looking edges. Granted, you could get around this problem via skin segmentation on the RGB or IR images that both Kinects also provide.

A camera only system like yours can work well with enough effort, but with a depth image you will almost certainly get better segmentation and detection results with simpler code. Then you can focus on higher level problems like static and dynamic gesture recognition, and design a better UX around those things. This does come at the price of needing a more expensive and larger sensor, though there's a few depth camera options available these days.

Hope this helps! Happy to discuss more if you'd like.

[1] http://www.bu.edu/vip/files/pubs/reports/CCSSM13-04buece.pdf

Edit: re your comment on Python, I have gone the C++ route (as well as just about every other mainstream language) and can honestly say that Python is a much easier development environment than what you will find in C++. Numpy and Scipy are where it is at. Even after completing two major software projects in C++, I have almost an order of magnitude greater productivity in Python, largely because these libraries and the incredibly readable syntax. Also they are blazing fast - you will be hard pressed to write faster C++ than the C code that powers Numpy. In my experience the algorithms you use and your system architecture will be a much larger factor than your language of choice. I choose what gives me the highest efficiency in development because I can focus more on nailing the architecture, algorithms and building necessary tooling. If you have never worked with C++, I assure you, the darkness that is C++ compilation and linking hell is not for the faint hearted.

1 comments

Amazing ressource thank you ! I should have search deeper before my project, clearly not finished reading yet but there are clearly superior methods than I currently have.

My grudge on python was probably linked with the incompatible libs with the version 3. I should have used 2.7 and miss some features with OpenCV in restrospective, but I wanted to learn python at the same time and it seemed a waste of time to begin with an older version. I've had a lot more experience in C++ and I don't mind a few dependencies problem, but it's probably that I'm used to it now.

I would love to see if you have a demo of what you have done (we never think to record our projects but heh you never know), if not a link to the code can be interesting when you aren't busy :)