Hacker News new | ask | show | jobs
by nicklecompte 908 days ago
> You're talking like sig figs is error propagation, but it isn't.

No, that is the exact opposite of what I said! For starters, "uncertainty" and "error" are not the same thing here. I am saying significant figures in a measurement encapsulates an inherent measurement-specific uncertainty conveyed by significant figures, and that this uncertainty must be considered when doing calculations with that measurement. Just like the person I responded to, I don't think you've thought about why significant figures actually exist in the first place.

> The correct solution is error propagation (with appropriate estimates of the errors of the inputs), not arbitrarily rounding numbers at each step

Nowhere in my comment did I arbitrarily round anything. I thoughtfully propagated the uncertainty, which is why it was +/- 0.01m in the measurement and +/- 0.02m^2 in the calculation.

The whole point of my argument is that uncertainties in calculated quantities can be rigorously determined from the uncertainty of the inputs, and measurement inputs have uncertainty determined by the significant figures. On the other hand, ignoring significant figures in calculations means we're ignoring a potential source of uncertainty in downstream analysis. If you think significant figures is about "arbitrarily rounding something" then you are thoughtlessly applying high school chemistry rules. Please read this carefully:

If I measure something with a meterstick that is broken down into centimeters, that measurement has an inherent uncertainty of either +/- 1cm or +/- 0.5cm - which one you use is a problem-specific choice similar to p95 or p99 for statistical significance (if it was a physical meterstick I'd choose 1cm because human eyeballs aren't very good; if it was laser inference I'd choose 0.5cm).

So if I am a data scientist with a database of direct measurements from a meterstick, each one has an inherent uncertainty of +/- 0.01m that's implied by the data source even if it's not in the database. This is the entire point of representing the data as 12.03m, 1.00m, etc, instead of 1.234m. If you represented a measurement as 1.234m that would imply your meterstick could measure decimeters, but it probably can't. So 1.234m isn't merely against the rules, it's inaccurate.

If you take a measured side length of 1.00m and say the calculated area is 1.00m^2, then naively someone might think the uncertainty in the area is +/- 0.01m^2 based on thoughtlessly applying dumb high school chemistry rules. But that's not true, the uncertainty in the calculated area is in reality +/- 0.02m^2. The measurement can be presented without an explicit +/- because the significant figures acts as a "shorthand" and we don't need to do calculations to estimate the uncertainty. But the calculation must present a calculated uncertainty.

Programmers and data scientists are lazy about significant figures because they don't care where the data is coming from, to them it's all doubles in a database, and significant figures is just a matter of rounding things correctly at the end. The area-of-a-square argument proves that this is a mistake.

1 comments

I still don't get it.

You are explaining error propagation, but my point is that _if you are doing error propagation (as you should do if you want to do things properly), significant figures ARE just for making deliverable pretty_.

You are talking about measurement uncertainty. Measurement uncertainty is written x +- y, with y being the uncertainty.

If you don't do that and use significant digit instead, you lose the information and precision: 10.0 +- 0.1 is 10.0, 10.0 +- 0.2 is 10.0, 10.0 +- 0.3 is 10.0, ...

This is why the other person was talking about "arbitrarily rounding".

You should _never_ said "well, it's a measurement of 10.0 with a 0.2 precision, so I can write 10.0", you should _always_ write 10.0 +- 0.2 (in which case, you can also write 10 +- 0.2 or 10.000 +- 0.2, the significant digits have no impact on any future results). Writing 10.0 instead of 10.0 +- 0.2 is just a terrible practice that does not have much justification, 10 +- 0.2 is always a better way. (and my point is that the problem you have with the significant number disappear if you teach people to use a non-clumsy way)

(and, no, you should not do the distinction "it's a measurement, so it's written differently", because in practice, a lot of "measurements" are in fact already a transformation, and sometimes you cannot even know for sure yourself. For example, a temperature sensor will measure an electrical resistance (with a measurement uncertainty) and then convert it into a temperature, and according to you, it should not be written the same way, just for arbitrary reasons)

I don't think you understand what a measurement is. There's a very good, very short book that explains in more detail what I am talking about, in the context of physics experiments: https://www.amazon.com/Practical-Guide-Data-Analysis/dp/0521...
I think your notions are just too basic. It's a bit like in school when the teacher says "you should write all your sentences as subject + verb + complement". It is good at school, to teach students the basics and to put boundaries of the studied regions (you don't want to have students using more complex notions by accident and having to cover everything in lesson one), but as soon as you begin to be a professional writer, you realise it is better to ignore this rule.

I know the notion of measurement that you try to explain, I've studied it when I was an undergraduate students. Since then, I have passed beyond this notion and use something better. It's not a matter of "you don't understand", it's rather a matter of "you understand too well and see the limits of this notion and that it's not useful for you anymore".

The book you share seems to confirm that: it is for undergraduates. Things get more complicated with real world practice, and the basic rules used to forge the understanding needs to be left behind. For undergraduate students, they are going to do basic lab experiment with a ruler and a chronometer, and the goal is just to practice, not to answer to a real unknown situation. In real life, no one needs to measure things as trivial as what they are measuring. When people do that, they realise that the distinction between calculated value and measured value is meaningless and not helpful at all.

Again, as I've said, you just use x +- y and you don't have any problem. What would be the problem of using x +- y that you will not have otherwise (knowing. of course, that you are educated enough to understand very complex notion and that therefore you totally understand and know things as trivial as significant digit already)?