| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by melq 805 days ago
	Estimating the amount of unique elements in a set and counting the amount of unique elements in a set are very different things. Cool method, bad headline.

3 comments

jameshart 805 days ago

They’re not very different things; the terms are used interchangeably in most contexts because in the real world all counting methods have some nonzero error rate.

We talk about ‘counting votes’ in elections, for example, yet when things are close we perform ‘recounts’ which we fully expect can produce slightly different numbers than the original count.

That means that vote counting is actually vote estimating, and recounting is just estimating with a tighter error bound.

I kind of think the mythology of the ‘countless stones’ (https://en.wikipedia.org/wiki/Countless_stones) is a sort of folk-reminder that you can never be too certain that you counted something right. Even something as big and solid and static as a standing stone.

The situations where counting is not estimating are limited to the mathematical, where you can assure yourself of exhaustively never missing any item or ever mistaking one thing’s identity for another’s.

samatman 805 days ago

> the terms are used interchangeably in most contexts

Counting and estimating are not used interchangeably in most contexts.

> because in the real world all counting methods have some nonzero error rate.

The possibility that the counting process may be defective does not make it an estimation.

> We talk about ‘counting votes’ in elections, for example, yet when things are close we perform ‘recounts’ which we fully expect can produce slightly different numbers than the original count.

We talk about counting votes in elections because votes are counted. The fact that the process isn't perfect is a defect; this does not make it estimation.

> That means that vote counting is actually vote estimating, and recounting is just estimating with a tighter error bound.

No. Exit polling is estimation. Vote counting is counting. Vote recounting is also counting, and does not necessarily impose a tighter error bound, nor necessarily derive a different number.

> The situations where counting is not estimating are limited to the mathematical, where you can assure yourself of exhaustively never missing any item or ever mistaking one thing’s identity for another’s.

So like, computers? Regardless, this is wrong. Estimating something and counting it are not the same thing. Estimation has uncertainty, counting may have error.

This is like saying addition estimates a sum because you might get it wrong. It's just not true.

jameshart 805 days ago

So, IEEE floating point doesn’t support ‘addition’ then.

samatman 805 days ago

IEEE 754 defines an exact binary result for the addition of any two floats.

That this bit-identical result is not the same operation as addition of real numbers is irrelevant, because floats aren't reals.

f1 + f2 is not an estimation. Even treating it as an approximation will get you into trouble. It's not that either, it's a floating-point result, and algorithms making heavy use of floating point had better understand precisely what f1 + f2 is going to give you if they want to obtain maximum precision and accuracy.

aidenn0 805 days ago

Cool, so next time I have numbers that aren't reals to perform math on, I can use floats.

samatman 804 days ago

Or if you have numbers that aren't integers to perform math on, you can use integers.

It's not a new problem, and it isn't specific to floats. Computers do discrete math. Always have, always will.

lupire 805 days ago

Come on. There is a fundamental difference between trying to get an exactly answer and not trying to get an exactly correct answer.

jameshart 805 days ago

It’s not a fundamental difference, it’s a fundamental constraint.

There are circumstances - and in real life those circumstances are very common - where you must accept that getting an exactly correct answer is not realistic. Yet nonetheless you want to ‘count’ things anyway.

We still call procedures for counting things under those circumstances ‘counting’.

The constraints on this problem (insufficient memory to remember all the unique items you encounter) are one such situation where even computerized counting isn’t going to be exact.

Levitz 805 days ago

I agree with you, but we are talking theory here. The algorithm doesn't count, it estimates.

You can make an algorithm that counts, you can make an algorithm that estimates, this is the second.

sangnoir 805 days ago

Estimation is counting with error bars.

Frankly, most of what you consider counting in your comment needs error bars - ask anyone who operated an all-cash cash-register how frequently end-of-day reconciliation didn't match the actual cash in the drawer (to the nearest dollar.)

The following is a list from my personal experience - of presumably precisely countable things that didn't turn out to be the case: the number of computers owned by an fairly large regional business, the number of (virtual) servers operated by a moderately sized team, the number of batteries sold in a financial year by a battery company.

shkkmo 805 days ago

Counting is a subset of estimation, not a synonym.

If I estimated the number of quarters in a stack by weighing them, that would be different from estimating the number of quaters in a stack by counting them. Both methods of estimation have error bars.

The list you provide is of categories that don't have clear definitions. If you have a sufficiently clear definition for a category given your population, it has a precise count (though your counting methodologies will still be estimates.) If your definition is too fuzzy, then you don't actually have a countable set.

Levitz 802 days ago

I think I get your point completely, yet I'm not getting through.

Would you agree that 1+1=2? Or that pi is 3.14159...? These are mathematical truths, but quickly crumble in the real world. One apple plus one apple doesn't just equate to double the apple, no two apples are ever the same to begin with, there are no real perfect circles out there either, there is still value to those mathematical truths in that they make it evident that they are perfectly precise and that it is real world interaction which may bring error into the table.

seadan83 805 days ago

Counting and estimation are different by definition. One is a full enumeration, the latter an extrapolation from sampled data. In both cases 'accuracy' is a factor. Even if we are counting the number of stars, it is still a difference of technique compared to estimating the number if stars.

I could try to count fibers in muscle or grains of sand in the beach, chances are accuracy would be low. One can be smart about technique for more accurate counts, eg: get 10M sand counters and give them each 1kg of sand which they then count the grains with tweezer and microscope. That is counting. At the same time, we could find an average count of grains in 1kg sand from a random 100 of our counters, and then estimate what an expected total would be. The estimate can be used to confirm the accuracy of the counts.

naijaboiler 805 days ago

They are not really as far apart a you think. At small numbers, yes get are distinct. At large enough numbers, they in all practicality the same thing. E.g what’s the population of the US

jameshart 805 days ago

And by that definition this is a counting algorithm.

Koshkin 805 days ago

True - for (relatively) small numbers. For large (huge) numbers estimation is usually considered to be equivalent to counting, and the result is sometimes represented using the "scientific" notation (i.e. "floating-point") rather than as an integer. For example, the mole is an integer whose value is only known approximately (and no one cares about the exact value anyway).

YoshiRulz 805 days ago

As of May 2019, the mole has an exact value, and Carbon-12's molar mass is the empirically-determined value.

OutOfHere 805 days ago

This doesn't justify estimation to be equivalent to counting even if some mathematicians consider them to be the same. Floating points are for estimation. Integers are for counting. The two are not the same, not even for large numbers.

Koshkin 805 days ago

"Equivalent" and "the same" are sometimes equivalent. (Or the same.)

It depends on what the meaning of the word 'is' is.

https://libquotes.com/bill-clinton/quote/lby0h7o

dools 805 days ago

It's an approximation, not an estimation.

blackkettle 805 days ago

Actually, my understanding is that it is an estimation because in the given context we don't know or cannot compute the true answer due to some kind of constraint (here memory or the size of |X|). An approximation is when we use a simplified or rounded version of an exact number that we actually know.

dools 804 days ago

Wikipedia is on your side:

"In mathematics, approximation describes the process of finding estimates in the form of upper or lower bounds for a quantity that cannot readily be evaluated precisely"

This process doesn't use upper and lower bounds.

However, it still seems more like approximation than estimation to me because of this:

“Of course,” Variyam said, “if the [memory] is so big that it fits all the words, then we can get 100% accuracy.

It seems that in estimation the answer should be unknowable without additional information, whereas in this case it's just a matter of resolution or granularity because of the memory size.

Anyhoo ...

EDIT: also the paper says "estimate" and the article says both "approximate" and "estimate" at different times so it seems everyone except me thinks it's either an estimation or that estimation and approximation are interchangeable.

ranguna 805 days ago

Still very different things, no?

coldtea 805 days ago

It's the same thing at different degrees of accuracy. The goal is the same.

amelius 805 days ago

Still, counting things and counting unique things are two different procedures.

chrisweekly 805 days ago

For someone who's pretty well-versed in English, but not a math-oriented computer scientist, this seems like a distinction without a difference. Please remedy my ignorance.

lupire 805 days ago

My GP was wrong, but the words are different.

Eatimation is a procedure the generates an estimate, which is a kind of approximation, while approximation is a result value. They are different "types", as a computer scientist would say. An approximation is any value that is justifiably considered to be nearly exact. ("prox" means "near". See also "proximate" and "proxy".)

Estimation is one way to generate an approximation. An estimate is a subtype of an approximation. There are non-estimation ways to generate an approximation. For example, take an exact value and round it to the nearest multiple of 100. That generates an approximation, but does not use estimation.

jameshart 805 days ago

I’m not sure the linguistic differences here are as cut and dried as you would like them to be. Estimate and approximate are both verbs, so you can derive nouns from them both for the process of doing the thing, and for the thing that results from such a process.

Estimation is the process of estimating. It produces an estimate.

Approximation is the process of approximating. It produces an approximation.

You can also derive adjectives from the verbs as well.

An estimate is an estimated value.

An approximation is an approximate value.

But you’re right that the ‘approximate’ terms make claims about the result - that it is in some way near to the correct value - while the ‘estimate’ derived terms all make a claim about the process that produced the result (ie that it was based on data that is known to be incomplete, uncertain, or approximate)

davidmurdoch 805 days ago

The authors of the article disagree with you.

dools 804 days ago

The authors of the paper disagree with me, the authors of the article don't (they use both approximate and estimate, but the paper does say estimate).