Hacker News new | ask | show | jobs
by cheeselip420 1115 days ago
"Naturally?" lol

while(true) { read sensors update world model decide what to do act }

You should only deviate from this when you have a specific reason (concurrency, libraries, IPC, etc). You can attach a debugger. You can deterministically play sensor data through and get great reproducibility for end-to-end testing. Starting with a distributed system is a handicap 80% of the time.

2 comments

What do you do when your control software is on one computer and your perception software is on a different computer?
Design a system to efficiently partition the state between the two? I'm not saying that message passing is bad, just that it shouldn't be the default choice...
But that’s the thing, robots are supposed to interact with the world by default. They are supposed to integrate into society. At some level, there is a necessary distributed processing boundary, and in fact there are many - from the need to communicate with multiple internal heterogeneous processing units, multiple sensors running at different frequencies, external databases and cloud compute, remote operators or telemetry, ground stations, and even other robots. If you want them to be useful at all that is. How in the world do you integrate that system into a synchronous while loop?
This is a non-falsifiable argument. Of course there need to be abstraction layers between various systems. The question is whether pubsub, and all the baggage and difficulty that brings with it, is the correct abstraction mechanism. Tossing your data to the wind and hoping the next system picks it up correctly and runs with it is not how I envision building reliable, deterministic systems.
My argument is that robotics systems are naturally distributed. Pub sub works okay there, but the actor model is better in my opinion. Either way, I don’t see how it’s possible to argue a while loop is the main abstraction roboticist need.

Maybe we're talking about different kinds of systems. I work with robot teams, human-robot interaction, and long-term autonomy.

It really depends on the degree of granularity. ROS encourages the use of the actor model multiple times inside of the same machine. This is complete overkill, and actually reduces reliability and safety.

For example, how do you write unit tests for an actor-model system? Without unit tests, how do you properly characterize the code's behaviour? When I last did ROS work, I built the whole thing outside of ROS, tested and validated it worked with tests, and then put some small ROS wrappers on top, and it basically worked first time. But this isn't how ROS-native systems are developed, instead people use Gazebo/Rviz to tweak and add things, and you end up with a system that is grown organically, at the single algorithm level, with all the problems that entails.

As I posted cross-thread, in the actor model, with queues and threads, you inherently encode additional state via the temporal spacing of the messages. Trying to predict what all of these could be so that you can test for edge cases and make sure things are safe is basically impossible. The modularity of ROS lets you set up a giant system pretty quickly, but in order to iron out the edge cases takes about as much time as just rewriting the whole thing as a monolith, because you haven't actually been able to test the system properly and the long tail of hidden state and bugs is impossible to avoid, and also impossible to predict and test for.

From what I've seen of the ROS community, the concept of testing is severely lacking. It usually entails running simulations in lots of different scenarios, which in a testing hierarchy is only really your final integration tests. It doesn't tell you about degradations in various subsystems, eg. control or navigational ineffiencies. It doesn't tell you about regressions based on earlier behaviour. It isn't deterministic, so you get random failures, reducing trust in the testing infrastructure. It takes tons of compute, so your devs wait hours for something they should be able to know in seconds. And because it's slow, devs won't add tests to the same granularity they would otherwise.

In a high reliability environment deterministic code is really important. The actor model doesn't give you that, each and every time you cross its interface. It also makes abstractions for granular testing much more difficult. It isn't a silver bullet, and ROS leans so heavily on it that all of the downsides are effectively unmitigated and impossible to avoid.

It sounds like we're working in a similar space, for me it is drone obstacle avoidance and navigation systems, and I found ROS to be entirely unsuitable for anything more granular than inter-drone coordination.

Decision making and sensor reading happen at vastly different timescales.
You could have a lidar coming in at 15Hz, a camera at 30Hz, odometry at 60 or 100Hz - but typically you'll want to plan within that same range, at least for navigation (20-50Hz). "Vastly different" is a bit of a stretch.

Also - we have used queues to deal with different time scales for a really long time. It works fine here too.

For higher-level behaviors around grasping or manipulation, your point is super valid though. I suppose I'm mostly focusing on navigation-type tasks.

You aren’t thinking broad enough. Algorithms can run at megahertz, sensors can run at 10s of kilohertz to 10s of Hz, control loops can run at 5Hz. Remote database calls can run of course much longer than that, and then you have very long range planning tasks that can cycle days or weeks depending on deployment. I’d say that’s quite the range.

And you mention queues, yes exactly. Abstract a little more and you get pub sub. Abstract a little more and you have the actor model, which is a lovely way of building resilient, reliable, fault tolerant systems — exactly what we want out of robots.

Control loops also need to run at kilohertz and if you can't schedule them to run without jitter the whole system is useless. Realtime systems need to have an understanding of time budgets otherwise they will never be reliable enough for actually running in places where if they work suboptimally money is lost.
Indeed, there are many different levels of control loops.