Hacker News new | ask | show | jobs
Show HN: 3D sensing SDK for iOS – Produces point cloud and 6DOF device motion (realitycap.com)
42 points by benhirashima 4661 days ago
6 comments

Really impressive! Nice name, video, website!

However I would strongly recommend picking only one of your "features", the indoor navigation. If I were you, I'd definitely try to build a business by concentrating only on indoor navigation!

Indoor navigation is a huge new area where all the big players are looking for possible partners/acquisitions right now! Overlay-based AR, and the measuring tape demo is a joke compared what you've shown in indoor navigation!

You really have a chance of making a successful company based only on the indoor navigation feature. Forget the pricing for now, just offer it as a free beta on both iOS and Android and try to get the word out as much as you can.

Good luck!

I know of several large companies that would be seriously interested in the measurement feature. Manually measuring the rooms in a house is seriously time consuming.
Looks very cool, but it feels like there's a big gap between Evaluation (online only, low API call limit) and Enterprise ("contact us") pricing models.

Not sure what market you're ultimately going for, but right now it seems to defeat the point of providing a nice simple API if it's only usable for either throwaway projects or by very large customers.

You make a great point, and we certainly don't want to exclude the in-between cases. We're still figuring out what those pricing tiers might look like, so if you've got an application that you're excited about, just let us know and we'll figure out a way to make it work!
Really amazing product! Is the form on your site the most expedient way to get in touch/get access to the SDK? Also, is the error accumulation small enough so that you could use the product continuously for extended periods of time?
Thanks! Yes, please use the form on the site. We're pretty busy today with our launch, but we promise to get it to you ASAP. And yes, usage over longer periods of time is possible.
Good to hear! Hope I can find some time to play with the evaluation version!
Very cool! Can you give a brief overview of the underlying technology you use to extract the distance data back out from a 2D image?
These are usually implemented using Structure from Motion techniques, but more specifically in this case SLAM (Simultaneous Localization and Mapping). There are two sources of information: from vision and from the Inertial Measurement Units (IMU) on the phone.

For the vision part, you start by extracting interest points in all images (Harris keypoints, or SIFT, or similar), then you match them up by using local patch descriptors, (a reasonable implementation in OpenCV for example is the Lucas Kanade Optical Flow tracker) and once you have the correspondences you can estimate a relative 3D camera transformation that explains the motion. In this case the problem is hard because the depth of every point is unknown in addition to the camera transform.

For the IMU stream you can use the accelerometer and gyro in the camera which gives you an estimate for both linear and rotational acceleration. These can be integrated over time to get a reasonable guess for the camera transformations from one time point to another as well.

You combine the two guesses (from vision and from the phone inertial measurement units) into a best guess, and then combine that in addition with the best guess from 30 milliseconds ago to arrive at an evolving probability distribution of this best guess over time. Standard way would be something like a Kalman Filter.

Another issue is dealing with drift over time, as errors in estimation build up and if you're scanning the same area your model will start to drift. This requires something called "Loop Closure" which optimizes the camera matrices across the entire duration of scan and not only frame to frame. This is very computational intensive and hard to do online and without it scans for longer than few seconds will get progressively uglier and misaligned.

This stuff is super tricky to get right. Also, be skeptical of these demos because they are easy to can. It's fairly easy to get that one shot where it looks like it works, but in practice these are exceptionally fragile and very very difficult to get working. Though I'm impressed it seemed to work okay inside the mall -- with all the specular reflections from the floor. Though I'd guess that if anyone placed a foot into the field of vision (and made the environment geometry nonstatic) it would all break :) Good luck to the team though!

Very true: "Be skeptical." Also very true: "Another issue is dealing with drift over time, as errors in estimation build up and if you're scanning the same area your model will start to drift."

Apple Developer Videos on Sensor Fusion specifically mention NOT to do this even though their tech uses the Gyroscope which is orders of magnitude more precise than the accelerometer.

I believe it's "Understanding Core Motion." (Developer account required)

https://developer.apple.com/videos/wwdc/2012/

Are there other products that you know of that implement the kind of video+IMU in -> measurements + map out API they seem to be striving for?
I'm fairly sure that the tech is based around "structure from motion". The API simultaneously estimates the position of the camera at each point in time, and the location of some reference points (blue circles in the vid).

Because the device has an accelerometer, it is even able to extract distances, not just relative distances. I'm actually surprised by this as I always assumed the accelerometer was too noisy to be of use for this.

I tried to do a similar thing myself, but the problem is technically very difficult. While a lot of research has been done on structure from motion, actually packaging it into a usable API is a big task

Accelerators are worthless to measure distances even if they were super precise. This is because you have to do double integration to get distance and the errors only accumulates. My guess is that they have used algorithms like SIFT to track points in space and estimate only small relative distances from sensors (gyro + accelerators). This is however is very cool. They should make this as an app that can construct 3D model (i.e. turning iPhone in to 3D scanner) and ability to send the model for printing.
>Accelerators are worthless to measure distances even if they were super precise. This is because you have to do double integration to get distance and the errors only accumulates.

I never said that a distance scale was obtained by applying double integration to the accelerometer output. I only said that in order to measure absolute distances, as opposed to relative distances, it is necessary to have an accelerometer, since no other data provides an absolute scale. See the other reply by one of the founders for the details.

You've explained it well! And you're exactly right, thanks to the accelerometer we get distances in real-world units. The trick is very closely integrating computer vision and inertial sensing. Images provide an external reference that can clean up the noise from the accelerometer, and the accelerometer provides absolute scale which you can't get from images alone. (I'm one of the founders, BTW.)
Yep, that's pretty accurate. I'll let our PhD guy come explain a bit more himself. Sorry, we're a bit busy with our exhibit at Disrupt, bit we'll be able to answer more questions later.
My guess is they store a point cloud, whith each point mapped to a pair of coordinates on the image.
This technology is awesome! If it's half as impressive in real life as the demo suggests you have done a fantastic job creating some really innovative technology. I wish you the best of luck in turning it into a real business!
I would love to see this combined with object extraction to make 3d models from the measurements. Here's an amazing demo: http://www.youtube.com/watch?v=Oie1ZXWceqM
looks cool. You could probably use this for stabilising video footage