Hacker News new | ask | show | jobs
by jason_wo 1205 days ago
I just found this presentation with technical details: https://group.mercedes-benz.com/dokumente/investoren/praesen...

I guess it is not a typical OS, but more like a collection of tools that build on already existing OSs. This is comparable to ROS (Robotic Operating System), which are just some programs, middleware, services and conventions to build software for robots on Linux.

It seems like this should integrate and abstract different "OS" (Linux, QNX, AUTOSAR) and run on very different platforms (high power application processor for infotaiment, microcontrollers) (Slide 13). These are widely different systems:

1. Linux needs a Memory Managing Unit (MMU), which only comes with high(ish) powered application processors, e.g. Arm Cortex-A9. These are obviously not hard-realtime because a page fault can occur non-deterministically (except when you can lock everything to RAM). This might be used for infotaiment.

2. (Classic) AUTOSAR is used without a operating system on a microcontroller like the ARM Cortex-M or a automotive MCU like the Infinion TriCore, which can run two cores in lockstep to verify each computation. AUTOSAR is kind of the operating system and you buy an "adaption"/HAL of AUTOSAR to the each MCU from a vendor. This is widely used in many ECUs for hard real-time control, e.g. to control something in the engine, and other stuff like the electric windows. AUTOSAR is a huge pain in the ass to develop for. You usually configure "it", which takes a lot of time. Then a software generates a huge amount of code. The software is from another vendor, e.g. Vector or Elektrobit. The developer fills out the function stubs implementing the actual function. Alternatively, you can generate the code from MATLAB/SIMULINK models with a code generator from yet another vendor (model-based-development). The upside of this, that the HAL and code generators are certified and everything is somehow standardized. The downside is that normal developers want to kill themself, you learn no transferable skills, and the huge amount of generated boilerplate code, that is hard to read.

3. There is also a newer Adaptive AUTOSAR, which can run on Linux or QNX.

I guess (page 8) they want to use it for infotaiment (point 1), interior control (lights, climate control; probably point 2), automated driving and "central driving" (point 1, point 2). I am not sure if this includes typical fast hard-real-time tasks like engine control or chassis control (=vehicle dynamics control).

I am not sure if really want to abstract it all or just extend the "OS" (Linux, QNX, AUTOSAR) with libraries and components, mostly in user space.

If you look at slide 13, you might guess that they will adapt Linux and QNX to run their UI MBUX (in QT). They extend it with services that communicate with ECUs in the car and their servies in the internet. Moreover, they allow to install sandboxed apps from Mercedes, Android Auto (e.g. Spotify) on top of it. They also come with an app store: https://faurecia-aptoide.com/

The real-time ECUs in the car running AUTOSAR will just get additional components to easily communicate with other MB.OS parts and support some newer features like OTA update.

I have not seen any details how this relates to ADAS functions. These are typically (partially) run on a compute node made by an automotive supplier with a hardware accelerator from NVIDIA, e.g. ZF ProAI (https://www.zf.com/products/en/cars/products_64166.html) or from Valeo (https://www.valeo.com/en/domain-controller/).

2 comments

Just a nitpic, Linux can run on CPUs without MMUs, and it has APIs for locking memory and real time scheduling but that's not why it is not a hard-realtime OS.
Yes you are correct. I am currently having a project, making Linux "as real-time as possible": locking memory with mlockall, isolating cores, preempt kernel patch, .... It is still not real-time because you have no guarantees, but you typically get a max jitter of 0.1 ms, which is good enough for my use case.

You could use Linux without a MMU (uClinux), e.g. on a Cortex-M, but is a horribly experience and no standard program works.

Preempt RT does give guarantees in the sense that unbounded latency is a bug and theoretical maximum latency bounds are known (see [0]). It is neither certifiable nor formally proven, but it's good enough for almost anything that isn't safety-critical. For the things that do require functional safety, you can use AGL and other hypervisor architectures that partition the critical and non-critical tasks with a few more changes to your code.

[0] https://bristot.me/demystifying-the-real-time-linux-latency/

Preempt RT doesn't really give that. It might give that when you run a subset of Linux, but that is not Linux like nommu Linux is not Linux. They might say that's a bug, but there are countless algorithms and data structures in Linux that mean the state of the system and other workloads can slow down other parts of the system. Even setting aside the fact that a lower privileged process can take spin locks (not "preempt spinlocks" but real low level spin locks), disable interrupts, etc., they can influence shared data structures such that allocations, lookups, etc can take longer for the higher privileged thread. So you still end up with a "look we tried really hard and if you don't use any kernel facilities including blocking and isolate this CPU entirely, lock everything and don't allocate memory or take page faults after that, you might get something approaching hard-realtime".

People try to paint it as soft-hard-RT or something, but it's not, there already exists a good word for it which is soft-RT. Which is fine, it's highly useful.

There aren't really formally proven hard realtime operating systems of any non-trivial complexity are there? They are either extremely simple executive layers, or some very limited privileged functionality that sits on top of the rest of the kernel.

I'd highly recommend you read the link and the paper it's based on. It's pretty thorough in addressing the limitations. Those same limitations apply to virtually every commercial RTOS out there as well though.

As for formally verified systems, depends on your definition of "nontrivial". You can build complex systems from the building blocks provided by the well known examples like SeL4 and pikeos. On a practical level though, complete formal verification is incredibly uncommon for exactly the reasons you'd expect. There's usually a mix of formal methods and other verification methods employed in safety critical applications. It's "good enough" given current capabilities.

I did read it. I understand and work on Linux including real time Linux. Nothing of what I said is wrong. Hard realtime operating systems of course are more limited than general purpose Linux too, but they tend to have a much better handle on limiting and controlling latency and how non-critical workload can impact critical tasks.

And seL4 is formally verified but as far as I know it has not been formally verified for hard realtime. Funny thing about formal verification is that it's easy to do if you control the requirements :) (/s - nothing to take away from the incredible work of sel4). Last I heard people had sketched or theorized about ways it could be approached, but not done.

Very interesting. Thanks for the link to the paper. Isn't the provided paper "just" about the scheduler? Eventually, I would have to output some data, e.g. on the CAN bus with socketcan through the network stack. This is probably a huge amount of code for which worst-case-execution-times are probably hard to get.

Does AGL mean automotive grade linux? What would be other hypervisors?

The time it takes to put things on a physical bus will depend on your hardware and can be bounded, but this isn't the guarantee you're getting from any RTOS.

The main thing the "RT" in RTOS guarantees is that the OS will return control back to you in a defined amount of time as soon as you're ready to run. You're still responsible for ensuring all of the other system requirements for bounded latency are fulfilled, like hardware that doesn't introduce unbounded latencies the OS can't control (surprisingly difficult with modern HW). Assuming you've done all of that, preempt-rt will give you essentially the same guarantees because of the scheduler work linked.

Yes, AGL = automotive grade Linux.

It's notable that Elektrobit seems to work on getting Rust on QNX:

https://github.com/rust-lang/rust/pull/106673