| > For example, how do you write unit tests for an actor-model system? In an actor model, the units would be the actors. Test that they are deterministic and behave correctly given a message. You can test them for robustness by fuzzing messages and throw them at the actor. Then you use integration tests to test the whole system's performance. > But this isn't how ROS-native systems are developed Note that I haven't been arguing for ROS, but for loosely decoupled architectures for distributed systems like robots. I agree that ROS has many shortcomings. Although I would say this is not a shortcoming of ROS, but of ROS developers. Maybe ROS can be blamed for guiding people to work in such a way. > As I posted cross-thread, in the actor model, with queues and threads, you inherently encode additional state via the temporal spacing of the messages. Systems other than ROS do it better, but the point I've been trying to get across is that the actor model is great for distributed systems because it makes explicit the inextricable asynchronous, distributed nature of the system. As I've been arguing, you need to pass messages at some point if you want the robot to be a robot -- it has to interact with the world and society at some level, likely many levels. Your obstacle avoiding drone I assume is communicating with a base station, maybe remote compute, and a remote human operator. If we want to properly test this kind of system, we're going to have to make explicit the fact that the network is not reliable, latency is not zero, etc. In this light, temporal spacing of messages, rather than being an encumbrance, becomes a necessity. It's a means to test and ensure that the system can handle all sorts of timings and orders of messages, just as it would need to do in the real world. By designing and conducting our tests to incorporate this, we can effectively simulate and anticipate the conditions our system will face. Also, time-deterministic messaging protocols can be used to better manage this temporal aspect. > you haven't actually been able to test the system properly and the long tail of hidden state and bugs is impossible to avoid, and also impossible to predict and test for. But does the monolith avoid the edge cases or does it just fall for the fallacies of distributed computing? > From what I've seen of the ROS community, the concept of testing is severely lacking. Again, this seems like a shortcoming of the ROS community, and not the actor model. |
> this light, temporal spacing of messages, rather than being an encumbrance, becomes a necessity.
And this is the crux of where we disagree. This is a messy part of reality which should be, as far as possible, abstracted away from the algorithms which need to operate on the data presented to them. If I'm running a Kalman filter I don't want to have to design in my filter around frequent gyroscope dropouts because image captures are happening, I want my system to have guaranteed behaviour that this won't happen. Actor model makes this harder by not giving me a way to have explicit guarantees, in fact it moves in the opposite direction by embracing flexibility.
While in general I agree that different components should be independently operable, as a system they will more than likely, in the real world, share various resources and you will need to deal with contention.
Any system which drastically increases overheads via serialisation, context changes, possibly network traffic and finally deserialisation in the place of a few instructions function call is a design which should be used very sparingly.
Actor model makes testing harder, and this results (again in the real world) in testing less. It also makes system level tests nondeterministic. Time deterministic protocols in place of function calls is just a nonstarter IMO. It's giving up control margin, increasing system load, and doesn't leave you any better with regard to system stability in case of failure.
Yes the actor model has its place, but at a very large granularity. Overuse, as in ROS, leads to horrible design constraints, opaque dependencies, difficult or impossible testing, and frankly impossible debugging.
Since you seem to be an actor model evangelist, how would you go about, just as an example, tracing execution flow in a debugger, for example? The data that gets passed into the actor interface is basically runtime-defined GOTOs. Similarly, how would you prove (in a certification perspective) that in certain scenarios the system as a whole behaves in a certain way, and fails in a safe way? Each subsystem can be proved to be safe, but the moment it goes through an async interface all bets are off.