| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gmueckl 3036 days ago
	If tracking down these kinds of errors on the desktop took you days, your tooling was maybe not good enough. I honestly cannot remember a an instance where valgrind completely failed me. This is lind of my gold standard for debugging memory issues. Also, some microcontrollers have amazing debugging support these days. Instruction tracing on Cortex M devices is a great feature, for example. The CPU will log every instruction that it executes over a serial interface for the hardware debugger to store. This allows you to go back in time after the fact, something that desktop debuggers have a really hard time with.

1 comments

arcticbull 3035 days ago

My point is, with a language like Rust, you can pretty much throw all this away. Why put yourself through this intentionally?

I also feel you're dodging my question. A 1-in-1000 spurious write to 0x0 is something you'll have a terrible time even identifying as the cause of your failure specifically because it is completely silent. Your embedded system just happens to stop working sometimes, where do you even think to begin? Assuming you know this is why, sure, throw on a watchpoint and call it a day, but how did you connect "heater stops heating" to 1-in-1000 write to 0x0?

You don't have to worry about that with a language that wont even let you make that invalid program in the first place.

gmueckl 3035 days ago

Well, this hasn't even been an issue for us in the last couple of years, even though we use controllers without MMUs. We have a quite complex C++ codebase and our coding style catches a lot of these mistakes outright.

Rust is simply not an option for us because of a distinct lack of tooling available for it. We need a ISO 61508 qualified toolchain including testing frameworks and there is none in sight for rust.

Also, out of interest: has anyone ever tried to write code in rust that is protected against bit flips caused by radiation? Our code is able to detect this because it stores long lived values also as bit inverted patterns and compares them regularly. This does not allow us to recover outright, but we can at least fail gracefully and attempt to reboot the device.