Hacker News new | ask | show | jobs
by HALtheWise 1831 days ago
I can't think of a polite way to say this, but as someone who professionally develops drone software, both of the software failures experienced by Ingenuity have been embarrassingly amateur at a technical level.

The first failure, which delayed the initial spin test, was described as a "watchdog timeout", which for anyone not familiar with embedded development basically means the code crashed. We all write code that crashes, but I am having trouble thinking of an excuse to justify the fact that their code crashed before takeoff, on Mars, and they didn't see it coming. There is nothing about sitting on the ground on Mars that shouldn't have been tested repeatedly on earth, and testing in production is _really_ not the right way to do Aerospace development (although Boeing Starliner would beg to differ)

Similarly, there are a huge number of things that can and will result in dropped frames when running Linux on a Qualcomm mobile chip, and having a software stack that infers frame timing purely from the sequence number is brittle, and would definitely not have passed code review and testing where I work (I actually checked, we do have a robust solution). If I had to guess, I suspect the root cause of the dropped frame wasn't actually anything exciting like a cosmic ray, but instead was some run-of-the-mill event that would have been caught by a couple hours of flight testing on Earth. Either way, it shouldn't have made it to Mars.

I'm sure that there are a lot of great engineers working on the Ingenuity project that _don't_ write these sorts of bugs, and am glad that theae amateur fuckups (barely) haven't crashed the drone before it has been able to do some incredible technology demonstration work.

5 comments

From my perspective, I see a project that took years of prep. Multiple papers were written. It was tested on software sim, and NASA's physical space simulator[1]. And it finally _successfully_ flew on mars, with some minor bugs.

In my opinion assuming they were "testing in production" (production being mars!), or writing code that "definitely not have passed code review", or that they did not do couple of hours of flight testing on earth, is an unnecessarily unkind assessment of this project.

[1] Helicopter Models and Test Facilities: https://rotorcraft.arc.nasa.gov/Publications/files/Balaram_A...

I had a friend who quit their JPL software job (to go work for a large, bureaucratic software firm) because JPL required such an extensive testing and review regimen before every change that the work got boring for being too slow.

I am very skeptical of the claim that this code was not tested before launch.

Thanks for your interesting post. I wouldn't be surprised if you are correct, at least based on my experiences with pseudo-governmental software development. I've noticed that, as a percentage, there seem to be fewer folks who've developed the grizzled paranoia that comes from repeatedly shipping commercial software under unreasonable constraints.

As an amateur RC pilot familiar with some of the excellent RC flight control systems, it would have been a huge missed opportunity if JPL didn't invite some experienced engineers from the commercial and consumer drone community to provide input (QA folks too!). It's hard to imagine they wouldn't have gotten ample volunteers to spend a few days helping out.

I understand JPL is already designing a larger and more capable iteration. It would be cool if experienced drone flight control devs such as yourself dropped the team a note.

It might be related to the hostility of the environment. The chips aren't radiation proof, so it is expected to have some bit flops due to radiation.
Why on earth wouldn't they use rad-proofed chips on a planet closer to the sun, with virtually no atmosphere compared to earth?
Because it’s a vital part of the mission, using off the shelf components in Mars...

Of course they use rad-proofed chips in the rover. But using a newer Qualcomm processor instead of an ancient chip was a big part of the idea.

Um, it's not closer to the sun...
It’s actually further from the sun by around 100 million km.
Watchdogs don’t just mean crashes. They are useful specifically because they can be used to terminate non-crash conditions such as infinite loops where forward progress is not being made but the program is still running.
It is sad that Skydio wasn't more involved with helping in the creation of Ingenuity.