Hacker News new | ask | show | jobs
by ahepp 1179 days ago
A few ms? In my experience that seems well within the capabilities of Linux. I guess last time I measured wasn't on a Raspberry Pi. I'm kinda tempted to take a shot at profiling this and writing up a blog post since it seems like a useful topic, although it will probably be a few months until I can get around to it.
3 comments

I think milliseconds overestimates by a few orders of magnitude, but non-real-time OSs really suffer for the intermediate IO stuff you expect in an embedded project (e.g. SPI, I2C, etc)

A long time ago, I was playing with Project Nerves on an Orange Pi running some flavor of debian. I was doing some I2C transaction (at 400 kHz, each bit is single-digit microseconds), and I ultimately had to have a re-attempt loop because the transaction would fail so often. I found a failure cutoff of 5 attempts was sufficient to keep going. I don't recall the failure rate, but basically, whenever a transaction failed, I'd have to reattempt 2-3 times before it eventually succeeded.

Meanwhile, on a bog-standard Arduino with an ATMega328P, I send the I2C traffic once, and unless the circuit is physically damaged, the transaction will succeed.

No, the consistency of the timing is terrible on Linux.

Seriously, stick a scope or logic analyser on e.g. an I2C line and look at the timing consistency. Even on specialised kernels for realtime use, you can have variable timing delays between each transaction on the bus. And this is all in-kernel stuff that's inconsistent--it looks like it's getting pre-empted during a single I2C_RDWR transaction between receipt of one response and sending of the next message. The actual transmission timing under control of the hardware peripheral is really tight, but the inter-transmission delays are all over the place. Compare it with an MCU where the timing is consistent and accurate, and it's night and day.

The parent comment says

> control a mechanism and reliably react on a deadline of a few ms

I actually did measure this with an oscilloscope on embedded Linux (not a raspberry pi). A PPS signal was fed into Linux, and in response to the interrupt Linux sent a tune command to a radio. Tuning the radio itself had some unknown latency.

End-to-end, including the unknown latency of tuning the radio, I never observed a latency that would even round to 1 ms. That's unpatched and untuned Linux, no PREEMPT_RT. I didn't dig any further because it met our definition of "reliable" and was well, well within our timing budget.

I'll be the first to admit it wasn't some kind of rigorous test, just a casual characterization. I would not suggest anyone use Linux for a pacemaker, airplane flight controller, etc.

This is making me itch to buy an oscilloscope and run some more thorough tests. I'd like to see how PREEMPT_RT, loading, etc changes things.

My profiling was on an NXP i.MX8 MPU, which is a A-profile quad core SOC very similar to an RPi. I think it was with a PREEMPT_RT kernel, but I can't guarantee that, but I was fairly shocked at the lack of consistency in I2C timing when doing fairly trivial tasks (e.g. a readout of an EEPROM in a single I2C_RDWR request). You wouldn't see this when doing the equivalent on an M-profile MCU with a bare metal application or an RTOS.

What is acceptable does of course depend upon the requirements of your application, and for many applications Linux is perfectly acceptable. However, for stricter requirements Linux can be a completely inappropriate choice, as can A-profile cores. They are not designed or intended for this type of use.

Profiling this stuff is a really interesting challenge, particularly statistical analysis of all of the collected data to compare different systems or scenarios. I've seen some really interesting behaviours on Linux when it comes to the worst-case timings, and they can occasionally be shockingly bad.

I was referring to that yes, even if Linux performs well in the ideal case, it's not necessarily reliable, and the possible problems are hard to compensate for.

Eg, your process can randomly get stuck because something in the background is checking for updates and IO is being much slower than usual, or the system ran out of RAM and everything got bogged down by swap.

On a microcontroller you just don't have anything else running, so those risks don't exist. Eg, a 3D printer controls a MOSFET to enable/disable the heaters. The system can overheat and actually catch on fire if something makes the software get bogged down badly enough. On a Linux system there's a whole bunch of stuff that can go wrong, most of which is completely outside the software you actually wanted to run.

I guess I feel like things are a bit tangled up here.

Sure, a single purpose MCU controlling a heater MOSFET has a lot fewer failure modes than a Linux device doing the same.

I don't dispute there are a lot fewer ways it's even possible for that system to misbehave.

The original comment was recommending ESP32s over Raspberry Pis for DIY projects like opening your curtains or flashing LEDs. The ESP IDF runs on FreeRTOS, so we're already moving away from the bulletproof single task MCU. People will almost certainly be adding some custom rolled HTTP webserver on top. They might be leaking memory all over the place, there are probably all kinds of interrupts they have no idea about firing off in the background. I wouldn't trust an ESP32 curtain-bot not to strangle me any more than I'd trust a Raspberry Pi based one.

Your example about running out of RAM seems just as relevant to MCUs. You can leak memory and crash an MCU. You can overload an MCU with tasks and degrade performance. You can use cgroups or ulimit to help prevent a bad process from bringing Linux down.

I agree that Linux is not going to be as reliable as going baremetal, and I'm not recommending you use it as a motor controller. But even the most reliable MCU can fail. An MCU can get hit by cosmic rays or ESD. People might spill water on the 3d printer or physically damage it. It's not even a binary "works right or dies" thing. I've voltage glitched MCUs to get them to skip instructions and get into an unanticipated state.

In any case, the best path to safety is to imagine that the computer might be taken over by Skynet and do everything in its power to kill you. Or worse, ruin your print. If safety is the goal it's probably best to achieve through requiring the computer system to take some positive action to keep the heater on. Or even better, a feedback safety mechanism like a thermal fuse.

Being within the capabilities of something and guaranteeing that it will never exceed that are two different things. At least in the past real time guarantees for Linux came as part of an optional patch set for the kernel since guaranteeing that an algorithm would complete within a set time frame or that things like priority inversion issues would be handled correctly came with a performance cost.