Hacker News new | ask | show | jobs
by fxtentacle 1622 days ago
"The Future of Hardware Is Software" ... says everyone that wants to sell you AI software on commoditized hardware.

Yet here I am, I recently won an AI competition by reviving an old 2005 algorithm and just using the fact that compute power has 7000x-ed since then (from 5 GFLOPS on a P4 to 35 TFLOPS on a 3090). No AI was needed.

And now I'm building custom electronics for a new type of 3D camera because even after years of AI and deep learning research, structured light and/or stereoscopic 3D depth estimation is still unusable. Try training a NERF from "only" 3 UHD images and you know what I mean.

3 comments

You're such a tease!

Seriously, please share the deets on this 2005 "junker" algo.

It was super unsatisfying to read your post with so much interesting information glossed over.

FYI fxtentacle, people keep upvoting this comment- there seems to be enormous interest, it's already at +17.

Will be sad if we never hear from you.

Sorry for the late reply. I got frustrated trying to figure out how to improve our Bomberland AI and decided to spend the rest of the day building lamps and shadow caster shapes in Lego ^_^

By now, I'm down to 5th place on the Sintel Clean rankings: http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=... but my entry H-v3 was 1st place when I submitted it. The algorithm is

Mota C., Stuke I., Aach T., Barth E. Divide-and-Conquer Strategies for Estimating Multiple Transparent Motions. In: Jähne B., Mester R., Barth E., Scharr H. (eds) Complex Motion. IWCM 2004.

https://doi.org/10.1007/978-3-540-69866-1_6

(so I misremembered the year. it was end of 2004 instead of 2005)

I did tweak it in a few details such as using a 5x5px ica instead of the constant brightness assumption but mainly I replaced the gauss seidel iteration (12) with brute forcing (10) so in effect I'm approximating the c* with Monte Carlo sampling on the GPU. Then as the last step, I use LUTs to fill in gaps in the prediction with their maximum likelihood prior as memorized from a large collection of real-world flow maps.

BTW as luck would have it, we are currently leading Bomberland (team CloudGamepad) with a deep learning AI trained for more than 200 million simulation steps. Yet JFB (the 2nd ranked team) uses handcrafted C++ rules and they beat us every time. It's just that against other opponents our probabilistic AI is random enough to confuse them, which is why we're still barely on the 1st place. But unless we can significantly improve things soon, I expect us to lose the tournament later this month because we will not be able to beat JFB in a fair duel. I bet on deep learning here and I'm already regretting it.

I'll reply about the camera to TaylorAlexander

Thanks for following up! It got up to 23 points of interested people, which in my experience is actually a huge number of upvotes for HN.
> I'm building custom electronics for a new type of 3D camera

I would love to know more. I am working on an open source farming robot and vision is an important component. Are you able to share more?

We're using the camera for an autonomous toy car racer, so I need reliable and real-time depth estimates. Existing cameras such as the Stereolabs ZED max out at 1080p @ 30 fps and they use rolling shutter which isn't even perfectly hardware-synchronized. Plus those sensors are tiny and, hence, as noisy as a laptop webcam.

The result is that the Stereolabs AI needs to be extremely lenient when doing the stereo matching because objects will almost never look exactly the same in both images, be it due to the noise or the rolling shutter skew. If I see a pattern repeat itself on both images with 5% RGB intensity, then on the Stereolabs ZED I need to ignore that, because it's most likely just sensor noise. If the image was almost noise-free, then I could treat this pattern as a reliable correspondence and triangulate depth from it.

Also, tracking fast movements at 30 fps is really difficult, due to the large movement offsets. If you scan for them, you need lots of compute power and you risk recognizing repetitive patterns as fast movement.

If you increase the hardware from 1080p to 4K, from 30 FPS to 120 FPS, from "really noisy" to "practically noise-free", and from "rolling shutter" to "hardware-synchronized global shutter", then suddenly you have 4x the data to make a decision on, all your offsets are 4x smaller due to higher FPS, and you can treat much weaker patterns as reliable.

And all that together means that surfaces like reflective wooden floor are now doable. Whereas before, most of the visible patterns would drown in sensor noise.

EDIT: And maybe one more comment: Our camera uses USB3 10gbit/s with a high-speed FPGA and it was completely designed in the excellent open-source KiCad. I even forked it to make things look nicer and more like Altium: https://forum.kicad.info/t/kicad-schematics-font-is-a-deal-b...

Very cool! Thanks for the details. I love Kicad. I presume your designs are not open source? But I am curious which cameras/sensors you are using, and any other chip specifics you can share.
Sadly, i can't share much there. NDAed Sony sensor with NDAed Infinion FPGA. Manufacturing was interesting because I didn't get export permissions for sending the FPGAs to China for pcb assembly. Eurocircuits got the job done but was kinda slow to work with.
Sure I understand. Well best of luck with the project!
I appreciate this train of thought and enjoy seeing it in action by others. Kudos!