Have a look at the literature on how smoking was established as a cause for cancer. You can't ethically intervene to have non-smokers smoke long enough to develop lung cancer. A lot of money and intellectual effort was spent on correlation not equaling causation in this case.
I'm no expert on the literature here, but Peter Norvig mentions the smoking-cancer example in his article on experiment design [0]. He gets to the same place the causality people do; observational studies.
The core idea behind a RCT is that the characteristics of a "unit" (a patient) can't affect which treatment is selected. On average, people who got treatment A are statistically the same as those who got treatment B. So you can assume any difference in outcome is a result of the treatment.
One of the simpler ways to do causal inference is by pairwise matching:
You try to identify what variables make patients different. Then find pairs of units which are "the same" but received different treatments. After the pairing process, your treatment and control groups should ("should" is doing some heavy lifting here) now be statistically "the same" by construction. Recall, that this is what we were going for in an RCT. If you did everything right, you can now apply all the normal statistical machinery that you would apply to an RCT.
The challenge is:
1. Identifying all the variables that make units alike.
2. You tend to throw away a lot of data, which reduces your statistical power. Even when the treatment classes are balanced, a given unit in class A may not pair up well with any unit from class B.
3. (Related to 2) Finding globally-optimal pairs of closest matches can be hard.
4. (Also related to 2) You need at least some people in each group. Sometimes the treatment and control are just so different that nobody pairs up very well.
In some sense, the pairing process is just a re-weighting of your data. People who are similar to someone in the other group have a large weight. People who are unlike the other group have a low weight.
You can generalize that idea a bit and reinvent what's called Inverse Propensity Score Weighting. In this case, you try to model a unit's propensity to receive a treatment, and then use 1/propensity as that unit's weight.
The intuition is: If the model says you were likely to receive treatment B (you have a low propensity for A) and you actually received treatment A, then you are likely to pair up with someone who actually received B. So we should up-weight you.
I'm currently working on my Master's thesis related to implementing propensity score matching for program evaluation in the child protection service system.
I cannot stress enough how important #1 is above. The most important part of making causal inferences in an observational experimental setting is identifying and collecting the variables associated with the treatment and outcome. It is easy to conceptualize but much harder to do in practice.
This is true only for a small subset of Causal DAGs even within this 'Causal Calculus'. It can't account for circular causality or discontinuous relationships. That's not to diminish your suggestion, only to contextualise it.
You can't really, at least not in the sense that I think most people think of it.
You basically need to make some assumptions that are broadly equivalent to assuming you've already correctly guessed certain parts of the underlying causal structure. So in a certain sense you're kind of begging the question, in a way that you wouldn't need to do if you had the ability to do interventions/randomized trials.
That being said causal inference techniques are still very valuable in making explicit exactly what assumptions you're making and how those affect your final conclusion and therefore how to minimize the impact of those assumptions.
The rules also provide a framework within which you can rule out some causal relationships. So they at least go some way to confirming which hypotheses can't be correct given the data.
I'm no expert on the literature here, but Peter Norvig mentions the smoking-cancer example in his article on experiment design [0]. He gets to the same place the causality people do; observational studies.
[0] https://norvig.com/experiment-design.html