Hacker News new | ask | show | jobs
by sudhirj 1867 days ago
Seems like a really clean and testable system, too. Can make a harness with a set of inputs, run the cycle and test outputs consistently. Performance is also easily checked, each cycle needs to run without 100ms for the 10hz systems, I guess, including garbage collection.

Nice it see things kept this simple.

5 comments

> including garbage collection.

Not only does flight control software typically never use garbage collection, it is also preferable to never allocate memory during the runtime -- all memory ever needed is allocated at program startup, and memory is not acquired or released until landing or some other major end event. Because memory issues are often the most significant causes of crashes, which you don't want to translate into, well.. a different kind of crash.

The creator of C++, Bjarne, wrote a nice paper on flight control software if you are interested in more.

They've been known to leak memory ...

https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...

>This sparked an interesting memory for me. I was once working with a customer who was producing on-board software for a missile. In my analysis of the code, I pointed out that they had a number of problems with storage leaks. Imagine my surprise when the customers chief software engineer said "Of course it leaks". He went on to point out that they had calculated the amount of memory the application would leak in the total possible flight time for the missile and then doubled that number. They added this much additional memory to the hardware to "support" the leaks. Since the missile will explode when it hits its target or at the end of its flight, the ultimate in garbage collection is performed without programmer intervention.

I think we've solved the GC vs Manual debate. Just add more RAM and explode your computer when it's done running
Very funny. But leak memory until you have to restart the process seems to be a very common strategy in practice. The programs explode even if the computer doesn't.
Many moons ago I remember talking with someone who works in HFT software and they said they'd just load the system with tons of RAM because allocation/deallocation and even manual memory management was too slow.
Really, this is just region-based memory management or actor-based memory management.
It's actually rocket-based memory management. Ba dum tss!
Reminds me of an old discussion in comp.arch.embedded (dating myself here!) about what you say when people at a party ask you what you do.

Hands down best answer was the engineer from a defense company: "I build flying robots that arrive at their destination in a really bad mood."

That is a great story, but I cringed at the part about adding additional hardware to support their leaky code. Surely there had to be a better way?
Don’t overengineer! Malloc has o(n) time in reallocs, so leaking memory can be a viable strategy

Edit: yes I know it’s more complicated than that!

Allocating/deallocating memory has a cost, if you can afford to just add more memory that's faster than any clever memory management solution you have.
Do you have a link to bjarnes writeup? My ask Jeeves skills are failing me.
I don't think it was actually written by Bjarne, I think it was K. Carroll and others from Lockheed Martin, but I expect the document is this one:

https://www.stroustrup.com/JSF-AV-rules.pdf

which is linked from the Applications page of Bjarne's website.

Ah, thanks for the link; yes this is the one I was referring to. I had thought it was by Bjarne since it was on his site, but either way, it's an interesting read.
This one might also be interesting https://web.cecs.pdx.edu/~kimchris/cs201/handouts/The%20Powe...

It's basically just a quick list of 10 useful rules to follow for safety-critical code.

(I've seen a different format in the past, that wasn't quite as fancy, but this is the only version I can find at the moment)

Interesting. I wonder how compiler-dependent rule 10 is. Like, if I'm writing for a compiler that gives really bad and usually unhelpful warnings that make my code worse... but I suppose these are more very strict guidelines than rules.
That section explicitly addresses that question and says you should rewrite it in a way that avoids the warning, since "usually unhelpful" means "sometimes critical". It's certainly an uncompromising view but that's what you get when failure is disastrous.
Pretty sure they use a language without a garbage collector for the control software. Probably C or ADA. Look at the FAA specifications for more guidelines. For airplanes it's DO-178C I think, not sure about rockets.

EDIT: One of the coolest projects I saw in recent time was CoPilot, which is a Haskell dialect that compiles down to C control loops which are statically guaranteed to run in constant time and memory. There is also arduino-copilot for if you want to play with ultra-hard real time software but can't afford an entire rocket.

According to the Reddit AMA last year SpaceX use: "C & C++ for flight software, HTML, JavaScript & CSS for displays and python for testing"
Yes, this stuff is nice.

But it is worth noting that these control loops can often have some sort of memory, so you normally need to test over multiple cycles - you usually have a test vector to load in, and watch for the response vector.

Trivial example would be if you had a PID controller implemented into your main loop. Your main loop would be storing the integral and "previous" error term.

Equivalent to handling an additionaanl sensor input and an additional output, no?
Yes, you could in principle convert any memory within a given loop into a separate unit. You just gotta pick between the tradeoffs between having more I/O and software/hardware units, versus having more memory within specific software/hardware units.

You just gotta find a balance that works for your requirements and constraints. In my application (it was medical devices), we found that stateful loops fit nicely.

True, but you do want to test the emergent control system.
That's the irony of closed loop control systems. They appear to be simple but the emergent behavior, particularly when there is a hierarchy of control loops, is incredibly complex.
I had once thinking about this type of system, and later found that the SNMP(1988) as well as NIST/Army 4D/RCS project(1980s) had this train of thought before I was even born. Now I'm wondering why this type of distributed, synchronized, hard realtime decision making framework don't seem to exist as some Apache Foundation project or something. It just sounds right and massively useful but I can't find an implementation, at least on public Internet -- why?
There are a few layers to your question.

If you mean distributed to mean "over multiple computers", then ya... this is a really specialized field. Your ability to synchronize and maintain hard real-timeness between computers is completely dependent on IO and hardware features, and topology. This makes it much harder to make a generalized framework.

We we ignore the distributed portion, then you're describing a real-time operating system - in which case something like https://www.freertos.org/ might fit the bill?

I mean realtime inter-node message passing in a tree topology network, with message frequency associated to edges, like shown in [1]. Each nodes might run on its own CPU clocks, but message exchanges occur realtime, progressively faster at each depth, and the system as a whole(what "Autonomous System" probably supposed to mean) runs as a slow but coherent realtime cluster.

Perhaps the idea that message passing has to happen realtime is something obvious or elementary to military/defense/aerospace from technical requirements or experiences fighting fog of war etc., but to me it was dumbfounding new and I guess it might also be for the general public?

1: https://en.wikipedia.org/wiki/File:4D-RCS_reference_model_ar...

It's not really a framework that you implement, more a process for making an implementation?
See the Beam/OTP as part of Erlang and now Elixir, Gleam, etc.

Erlang came out of Ericcson when they were building giant telephone switches.

Nowadays things like RabbitMQ and WhatsApp run on the Beam VM.

The beam handles both local and remote nodes, mostly transparently.

Then there are things like Nomad or k8s which try to tackle the problem more like *nix(linux/BSD/etc) does, but across multiple nodes. These are not meant for hard real-time systems like launching rockets obviously, not even for soft real-time.

Even though you can achieve pretty good latency with Erlang because of the lack of global memory (most if not all allocation is local to an Erlang process).
That’s mostly how kubernetes works also. It’s not that uncommon of a pattern
This is not how Kubernetes works. The concepts are completely different (pod, allocation etc.)
all the main controllers work by running a reconciliation loop constantly, what are you talking about?
Really nice for integration testing. Not to be confused with unit testing.

You could literally replay entire missions.