The consequences for this should be identical to if a maintainer had added a "rm -rf ~" or similar command in a project, with severity of punishment scaled by the popularity of the project.
No. This is the equivalent of putting "echo 'rm -rf ~'" or similar into a test suite. The output of a test suite is not intended to be piped straight into your shell, and if you decide to do so anyway the consequences are entirely on you.
If your agent executes any random instruction in a piece of text, it behaves like a shell, and you should either fix that or bury it deep in a sandbox.
Not at all. There is an expressed intent that there be a particular effect if the project is interacted with in a particular way. It's more similar to putting a '>>> subprocess.run("rm -rf ~", shell=True)' docstring in a Python codebase, with the expressed purpose of it hitting anyone who uses doctest.
If your agent executes any random instruction in a piece of text, it behaves like a shell, and you should either fix that or bury it deep in a sandbox.