Hacker News new | ask | show | jobs
by moglito 1121 days ago
Are you aware that ros service calls are RPCs (not based on pub/sub like actions)? Furthermore, if you use nodelets (http://wiki.ros.org/nodelet) you get zero copy communication between your algorithms. So I actually think that ROS has the facilities to address the needs you describe.

I'd be interested, though, in your suggestions on what a real architecture for robotics looks like in your mind? I still remember the time before ROS, 20 years ago, when each robotics team had to designate a sub-team just for building and maintaining the middleware. That was a waste of time and effort. But you seem to suggest that we go back to that? ROS might not be perfect but it's so much better than anything else that exists. It's also open source and we can all work together to make it better rather than reinventing the wheel each time.

2 comments

The async nature of pubsub makes it great for isolating your part of the system, but moves the complexity to the system integration instead. Actually deploying a ROS based system is about as difficult as rewriting the whole thing from scratch as a monolith. Every time something goes over a pubsub it's like using a GOTO, except your debugger can't actually follow it, and you don't even know who was listening (or who wasn't that should have been) or what the downstream effects were. It makes it impossible to properly debug a system, because it isn't deterministic so you can never be sure if you've actually handled all of the edge cases, since there is a temporal component to the state that can't be reproduced.

A better system would take ideas from game engine design and realtime system execution budgets, with cascaded controllers on separate threads with dedicated compute resources for components that need higher update rates.

The reason ROS has traction is because of university labs, who just need something to work once to be able to publish their paper or write their dissertation. In industry the reliability requirements are much higher, and despite the intensive efforts from the ROS community to "industralize" ROS via adoption of DDS, there seemed to be little understanding that the message protocol wasn't the reason the industry uptake was so low.

> A better system would take ideas from game engine design and realtime system execution budgets, with cascaded controllers on separate threads with dedicated compute resources for components that need higher update rates.

This is how I've structured a control system, but using a pub/sub system to share data across those boundaries (and log/inspect in general). "Nodes" that share some resource or fundamentally run back to back based on the data flow can live in the same thread. Higher rate components (eg inner loop controller) live in a thread with a much higher priority. All of this is event driven, deterministic, and testable.

If you have more details about the system you're imagining I'd love to learn more, because so far I don't see what's incompatible with what you've described.

In general PX4 is better than ROS in this respect, but IMO still leans too heavily on queues. A bit of feedback there:

- more should be done by setting up constrained functionality-specific data, and then simply calling a function with just that data. Right now a lot of things are passed state they don't need just because it is part of the message they receive. This makes the code dependencies way harder to separate, because you effectively share function signatures (messages) between different modules. Of secondary concern is the extra memory bandwidth from the extra data passed around, and not being able to pass by const& due to the async.

- lots of things don't need updating at all until they tick over. If you don't update it frequently it has old data. You can try to make sure it works with the latest data by updating it frequently, but that of course has big overhead. I don't see any decent way of making this work unless you either 1) set up global threadsafe state so everything can access it, which is bad from a dependency and locking perspective, or 2) just call functions synchronously with exactly the data they need.

- The issue with message queues, beyond the need for messages to generalize as mentioned above, is that often components need multiple different messages from different sources to perform their tasks. This means every component needs to keep a local copy of the data they need, to translate this async data back into a synchronous paradigm when it runs. In fact, beyond the message passing itself, pretty much everything needs to do its work in a synchronous paradigm, so why even add the async stuff in the middle to begin with? Once the messiness of different sensors' reporting rates is consolidated into the EKF state, and external commands and directives and brought in, after that point pretty much everything could be synchronous. No overhead, no timing issues, no "which message has the data I need", no "why do we have 3 different variants of the same message with slightly different data and which one should I use", etc.

As long as you log the initial input data it should still be able to do replays for reproduction of realtime behaviour, but a) better testing can be done because you can actually tell what every subsystem needs to run correctly just by looking at its function signature, and b) its much easier to refactor (and understand) because of the same properties of the function signature describing all of the inputs and outputs.

Services and Actions are built on top of pubsub (with separate Request/Response vs Goal/Feedback/Result topics respectively). At least w/ ROS1 - I'm not sure if ROS2 improved things here...

Nodelets are also a disaster, which is why ROS2 kinda fixed this by decoupling nodes and processes.

When you're just starting, ROS can be nice for prototyping - you get a batteries-included platform that can do some SLAM and simple motion planning. But as you start adding new features, you need to figure out how to add those features over multiple nodes. This coordination overhead can quickly bring your system to its knees, or at least make it extremely difficult to debug and troubleshoot when things go wrong.

No one should be building or maintaining middleware. Build robots. Read your sensor data, build a model of the world, decide what to do, then send commands to your control systems. This is the hard part of robotics.

ROS solves the easiest part of robotics (plumbing and process management) in the shittiest possible way.