|
|
|
|
|
by cauch
909 days ago
|
|
But the error propagation is not transmitted by the significant number. x, y being written with the correct number of significant number will not lead to f(x, y) being written to the correct number of significant number. Usually, the best approach is to propagate the uncertainty, for example by saving the uncertainty as another variable in the database and using it directly when the number is used. If you do that, there is no practical needs to lose time to format the numbers. Using significant numbers seems a "cheap trick" that risk to mislead you more often than help. |
|
Significant figures are not a convention for making your deliverable pretty. They have semantic meaning. Don't think about dumb rules from high school chemistry, think about the actual problem. There are two entwined sources of uncertainty I am referring to:
1) measurement uncertainty, due to a lack of precision in the instrument (or the quantity itself, e.g. many financial computations are not meaningful if they involve fractional cents)
2) computational uncertainty, which is exclusively due to algebraic propagation of measurement uncertainty
Far too many data scientists don't care about the first category of uncertainty because they don't care about where the data came from. And they don't even realize the second category is a problem.
Let's look at a specific example. Somebody tells us that they measured the side of a square as 1.0m. Their tape measure only went down to centimeters, so the uncertainty is +/- 0.01m. What is the area of this square? Let's look at it two ways:
1) The smallest possible side length is 1.00m - 0.01m = 0.99m, so the smallest possible area is 0.98m^2. The largest possible side length is 1.01m, so the largest possible area is 1.02m^2. Thus the area is 1.00m^2 +/- 0.02m^2.
2) The side length is (1.00m +/- 0.01m). So the area is
(1.00m +/- 0.01m)(1.00m +/- 0.01m) = 1.00m^2 +/- 0.02m^2 +/- 0.0001m^2 ~ 1.00m^2 +/- 0.02m^2
So the uncertainty is not +/- 0.01, it is +/- 0.02. This can add up quite dramatically. In general if you have x +/- delta, then f(x +/- delta) is not going to be f(x) +/- delta or f(x) +/- f(delta). It needs to be handled carefully.